At the beginning of my PhD I was quite stuck to find good data. I wanted data with good measures of mental health, physical health, employment, income, small area indicators and standard controls like age, gender, education and so on. I eventually found what I needed, but it required looking at many datasets. I am sure other people have faced similar problems. This post attempts to give you a bit of a short-cut, providing an overview of datasets that might be interesting to health economists who work empirically.
The focus here is on datasets which allow following individuals over time (even if they weren’t specifically meant as panels). The reason for this is simply that I have more knowledge of these datasets and these alone already bring this post to a substantial length.
The selection criteria for datasets to make it into this post are quite simple: I have to know of their existence and they have to be available in English. I cannot guarantee for the quality of all the dataset as I have only used a few of them. Additionally this aspect depends on your research question.
I will certainly have missed one or two interesting datasets, so I am already curious about your comments and additions at the bottom. The reason to include only English datasets is quite simply that using a dataset in a foreign language is a pain, though it is possible! I have a forthcoming publication using a Portuguese dataset without speaking a single word of Portuguese. However, it took me several highly disciplined days of using Google translate to find all the variables I needed. It worked out in the end for me, but I would not encourage it! The reverse is true, if you are proficient in another language. Using data from that country for your research (which might so far have only used UK/US data) might be part of your competitive advantage. Naturally, the table below is not exhaustive and as already mentioned I am looking forward to learning about other interesting surveys. However, here are a few words of warning. Most of my “warnings” are straightforward, but better to be safe than sorry, right?
- Not all of these datasets are free to researchers. In fact, some of them come at substantial costs, especially the administrative ones. I would advise to narrow it down and make your own cost-benefit analysis to determine which dataset you will use. Your supervisor and/or colleagues can probably give you some advice.
- Merging and cleaning most of these datasets is often a substantial amount of work. Economies of scale might therefore advocate in favour of sticking to one dataset if possible. Some datasets are even so complex that there are specific workshops on them. ISER, Essex, for example, is offering workshops on Understanding Society (for free) and CHE, York offer workshops on HES.
- If what you are looking for does not exist in one dataset, but exists in two datasets from the same source, you might want to check whether they are linkable. HES for example can be linked to the Scottish Health Survey and should soon also be linkable to ELSA.
- Another option to pursue if this table does not provide you the dream dataset is to check out websites which offer access to publicly available datasets to researchers. For the UK that would be the UK Data Archive, for Ireland look at the Irish Social Science Data Archive. A similar website exists for the Netherlands, called CentER Data. The European Institute also has a good website with an overview of datasets.
I hope this shortens the search for a data source for some and I am looking forward to hearing about further interesting datasets which could help me in my research.
|Country of Origin||Started in:||Still running?||Short description|
|1970 British Cohort Study (1970 BCS)|
|Yes||A study surveying a group of individuals born in 1970 in regular intervals|
|British Household Panel Survey (BHPS)|
|Great Britain (& Northern Ireland from wave 7)||
|No, stopped in 2008, but BHPS participants can be followed up in US||A well-organized general household panel.|
|Clinical Practice Research Datalink (CPRD)|
|England||2002||Following recent NHS reforms, CPRD will become obsolete as the NHS itself will provide one dataset including both Hospital and GP records, called care episode service (CES).|
|Cognitive Function and Ageing Studies (CFAS)|
|UK||CFAS I started in the late 80s. CFAS II started in 2008.||A UK study focusing on health and cognitive ability in the elderly.|
|Dutch Central Bank Household Study (DHS)|
|Yes||The survey is particularly well-suited to study financial effects onto health.|
|English Longitudinal Study of Aging (ELSA)|
|Yes||The study focuses on the socio-economic and health dynamics among the elderly in England, therefore only 50+ individuals are included.|
|European Community Household Panel (ECHP)|
|No, stopped in 2001. EU-SILC succeeds this survey.||A general household panel covering many EU member states.|
|European Union Statistics on Income and Living Conditions (EU-SILC)|
|EU||2003/2004||Yes||The successor of the ECHP used to create official statistics for the EU.|
|German Socio-economic Panel (GSOEP)|
|A very long still running household panel comparable with the BHPS or PSID.|
|Growing Up in Scotland (GUS)|
|Yes||A study especially well-suited to study Scottish children.|
|Hospital Episodes Statistics (HES)|
|England||Covers everyone treated in a NHS Hospital. Soon HES will develop into Care Episode Service (CES), a dataset including NHS records on both hospital and GP visits.|
|Household, Income and Labour Dynamics in Australia (HILDA)|
|Yes||A general household panel with good data quality, decent sample size and very easy to handle.|
|Korean Labor & Income Panel Study (KLIPS)|
|Yes||Especially well-suited to study the labor – health relationship.|
|Longitudinal Internet Studies for the Social sciences (LISS)|
|Yes||A smaller Dutch panel but interesting as it has a subsection focusing on immigrants and researchers can propose new questions at no cost.|
|Longitudinal Study of Young People in England (LSYPE)|
|Yes||A cohort study following a group of individuals who were 13/14 in 2004.|
|Mental Health Minimum Dataset (MHMDS)|
|Contains NHS record data about individuals with severe mental illnesses. In the future it is probably more interesting to look at CES once mental health institutions are integrated (see HES).|
|Millennium Cohort Study (MCS)|
|Yes||A study surveying a group of individuals born in 2000 in regular intervals.|
|National Child Development Study (NCDS)|
|Yes||A study regularly surveying individuals born in a particular week in 1958. It is the longest cohort study I am aware of.|
|Panel Study of Income Dynamics (PSID)|
|Yes||The longest running household survey covering a similar spectrum as the BHPS or GSOEP.|
|Survey of Family, Income, and Employment (SoFIE)|
|No, only 8 waves were carried out.||This survey offers 8 waves of data on income and employment from New Zealand. Detailed health data is only available for a few waves.|
|Survey of Health, Ageing and Retirement in Europe (SHARE)|
|Most European countries||
|No, only 5 waves were carried out.||A dataset specifically suited to study the elderly over time across Europe|
|The Irish Longitudinal study on Ageing (TILDA)|
|Ireland||2009/2011||Yes||The Irish version of ELSA also covering only 50+ individuals. It includes very specific health measures such as heart rate, blood pressure, grip strength, etc.|
|Understanding Society (US)|
|Yes||Currently the largest household panel available to researchers (ignoring administrative data).|