Data

Data

There has been a spectacular increase in the availability and quality of data from developing countries in recent years. Many of these datasets are either in the public domain or can be obtained at modest cost from the data collection agency. This page is intended as a resource to help locate those data. The International Household Survey Network provides links to documentation to many of these and other data. Some of the data are available at ICPSR, the World Bank Microdata Catalog and theHarvard Dataverse Network.

In addition, there has also been a huge investment in methods that seek to isolate exogenous variation in programs, policies, opportunities and constraints faced by individuals, households, families, firms and communities. Randomized controlled trials have played a central role in this work. The AEA maintains a Registry of RCTs which is an extremely valuable resource for investigators to register their RCTs and to learn about on-going or completed RCTs. In addition, a lot of authors are posting their collected data to Dataverse so please search there too.


Household surveys

Family Life Surveys

Indonesia Family Life Survey (IFLS)

An on-going longitudinal survey of individuals, households, families, communities and facilities, the first wave of the survey was conducted in 1993/4 and included interviews with 7,224 households in 321 communities in 13 provinces in Indonesia. The survey is representative of about 83% of the Indonesian population. The second wave, in 1997/8, was followed by a survey of 25% of the sample enumeration ares in 1998. IFLS3 was conducted in 2000 and IFLS4 was conducted in 2007/08.

The surveys contain retrospective histories about, for example, employment, marriage, fertility and migration over the life course of each respondent. The surveys also include household consumption, assets, self-reported health status and a battery of health measures (including anthropometrics, hemoglobin, blood pressure, lung capacity and time to stand from a sitting position). In 2007, cholesterol and dry blood spots were added.

Public domain data and documentation are available on the web .


Malaysian Family Life Surveys (MFLS)

conducted in 1976/7 and 1988 also contain extensive histories on employment, marriage, fertility and migration. Respondents in the first wave were followed in the subsequent waves; in the second wave, a refreshment sample was added. MFLS1 (1976/77) and MFLS2 (1988) are in the public domain.


Matlab Health and Social Survey (MHSS)

was conducted in 1996 and covers the same area as the Matlab Demographic Surveillance System. The data are in the public domain . A resurvey is underway.


Mexican Family Life Survey (MxFLS)

is an on-going nationally representative longitudinal survey of individuals, households, families and communities. The first wave was conducted in 2002. The first follow-up was completed in 2005. The second follow-ups was conducted in 2009/10. In addition to consumption, income, wealth, employment, marriage and fertility, the survey contains a module on crime and victimization as well migration histories. Respondents are followed if they move and interviewed in their new location. This includes people who move to the U.S. and those that return to Mexico. Biomarker data are collected and include assets, self-reported health status and a battery of health measures and dry blood spots. Data from the first two waves collected in Mexico are in the public domain .


Guatemalan Survey of Family Health (EGSF)

is a single cross section survey which was was conducted in rural communities in 4 of Guatemala’s 22 departments. The survey was fielded in 1995. The data are publicly available .


University of North Carolina Surveys


Cebu Longitudinal Health and Nutrition Surveys

Conducted by a team of researchers from the United States and the Philippines, the Cebu Longitudinal Health and Nutrition Survey is an ongoing study of a cohort of Filipino women who gave birth between May 1, 1983 and April 30, 1984 and have been re-interviewed periodically since then. The data are available at UNC.


China Health and Nutrition Survey

The China Health and Nutrition Survey was conducted in 1989 and 1991 in 8 provinces in China and provides a wealth of detailed information on health and nutrition of adults and children including physical examinations. These data are available at UNC.


Nang Rong (Thailand) projects

The Nang Rong projects represent a major data collection effort that was started in 1984 with a census of households in 51 villages. The villages were resurveyed in 1988 and again in 1994/95. New entrants were interviewed and a subsample of out migrants were followed. These data are available at UNC.


Russia Longitudinal Monitoring Survey (RLMS)

is an on-going panel survey of households in Russia that began in 1992. These data are available at UNC.


University of Washington CSDE Vietnam Research Projects


Vietnam Life History Survey (1991)

The Vietnam Life History Survey is a collaboration between the University of Wasthington, the Institute of Sociology and the Institute of Social Sciences, in Vietnam. The survey collects data from about 100 households in two urban and two rural areas in Vietname. The data are available at CSDE at UW.


Vietnam Longitudinal Survey (1995-1998)

The Vietnam Longitudinal Survey is a collaboration between Professor Charles Hirschman, University of Wasthington, the Institute of Sociology in Vietnam. The survey collects detailed demographic information from all adult respondents in over 1,800 households in one area of Vietnam. The data are available at CSDE at UW.


Rural Economic and Demographic Survey (REDS)

The National Council of Applied Economic Research has been surveying households and villages since the late 1960s as part of REDS. Some of the respondents have been interviewed in several rounds yielding a panel spanning 30 years. The raw data from the 1969, 1982 and 1999 waves are available on Andrew Foster’s web site . Foster provides an overview of the files here.


Indian States Data from EOPP, LSE

State-level data from India copiled by the Economic Organiasation and Public Policy Programme at the LSE is available here. Topics covered include

  • land reform
  • media and political agency
  • labor regulation
  • quality of life
  • economic reforms

India Agriculture and Climate Data Set

The database provides district level data on agriculture and climate in India from 1957/58 through 1986/87. The dataset includes information on

  • Area planted, production and farm harvest prices for five major and fifteen minor crops.
  • Areas under irrigated and high-yielding varieties (HYV) for major crops.
  • Data on agricultural inputs, such as, fertilizers, bullocks and tractors – in both quantity and price terms
  • Agricultural labor, cultivators, wages and factory earnings, rural population and literacy proportion.
  • Meteorological station level climate data (average climate over 30 year period)
  • Soil data

The dataset was compiled by Apurva Sanghi, K.S. Kavi Kumar, and James W. McKinsey, of the World Bank and draws on work by James McKinsey and Robert Evenson of Yale University.
For more information, click here . The data and documentation are available here . A note on converting the files to STATA written by Gareth Nellis is here .


National Sample Survey Organization

The National Sample Survey Organisation (NSSO) of India has a long tradition of conducting high quality surveys. NSSO carries out socio-economic surveys, undertakes field work for the Annual Survey of Industries and follow-up surveys of Economic Census, sample checks on area enumeration and crop estimation surveys and prepares the urban frames useful in drawing of urban samples, besides collection of price data from rural and urban sectors.
The data are available for purchase on CD.


China Health and Retirement Longitudinal Study (CHARLS)

The China Health and Retirement Longitudinal Study (CHARLS) is patterned after the Health and Retirement Study (HRS) in the US. Pilot data were collected in 2008 in two provinces: Zhejiang and Gansu (the richest and poorest provinces). One person aged 45 and over was randomly chosen in each household with an age eligible person, and they and their spouse were interviewed. The sample is representative of people 45 and over in these two provinces in China. This sample contains data on 1,570 households and just under 2,700 individuals. Data are available here. The first nationally-representativa wave of CHARLS will be fielded in 2011 and the second in 2013.


Mexican Health and Aging Study

is a prospective longitudinal survey of older adults (born before 1951) and their spouses. The first wave was conducted in 2001 and interviewed almost 10,000 adults and 5,000 spouses. The first follow-up was completed in 2003. The project is a collaboration of researchers at the Universities of Pennsylvania, Maryland and Wisconsin with INEGI in Mexico. It is directed by Beth Soldo.


SABE (Salud Bienestar Y Envejeveimiento en America Latina y El Caribe)

is a series of comparable cross-national surveys on health and aging organized as a cooperative venture among researchers in Argentina, Barbados, Brazil, Chile, Cuba, Mexico and Uruguay. The goal of the project is to describe health, cognitive achievement and access to health care among people age 60 and older with a special focus on people over 80 years old. Professor Alberto Palloni is the project PI which has been funded by PAHO and the NIA.


Colombian Familas en Accion

Familias en Accion is a poverty alleviation program in Colombia. Data are availablehere . Evaluation of the program is described at the Center for the Evaluation of Development Policies at IFS.


Learning and Education Achievement in Punjab Schools

The Learning and Education Achievement in Punjab Schools (LEAPS) Project is a multi-year project initiated by researchers at Harvard University, Pomona College, and the World Bank that attempts to capture and track changes in the educational universe at the primary level (upto grade 5) in 112 villages in Pakistan. The main component of the project is a set of extensive surveys designed & conducted by the LEAPS team, with care being taken to be representative of the various actors in the educational market.The data consists of questionnaires administered to all 823 primary schools (public, private, NGO) in the 112 villages, to over 800 teachers (with basic information on 5,000 teachers), 1800 households, 6000 school children, and achievement tests of 12,000 class 3 children in Mathematics, English, and Urdu. All children, households, schools and teachers are matched and then followed over three additional (annual) rounds of surveys, for a complete 4-year panel.

The first round of data from these surveys & related documentation is now publicly available for researchers at: www.leapsproject.org. The website also provides related information (questionnaires for all rounds, preliminary papers, and a LEAPS report that highlights findings from the first round).


South African DataFirst Data Archive

DataFirst, a research unit at the University of Cape Town, is a web portal for South African census and survey data as well as metadata and all research output based on this data. The catalogue of downloadable datasets is here.


Living Standards Measurement Studies (LSMS)

Since 1980, the World Bank has been collecting multi-purpose household survey data in several countries under the Living Standards Measurement Study umbrella. That site contains information about the project, lists the countries included in the project and describes how data may be accessed. For some of the surveys, the data are available on the web.


The Rural Income Generating Activities (RIGA) database

The RIGA project, a collaborative effort of FAO, the World Bank and American University in Washington, DC, aims to promote the understanding of the roles, relationships and synergies of on-farm and off-farm income generating activities for rural households. Building on existing household living standards surveys, the project has developed methodologically consistent, internationally comparable income data that are now available free of charge from the project’s website.

The database contains cross-country comparable indicators of household-level income for 26 surveys representing 16 countries across Africa, Asia, Eastern Europe and Latin America, making it a valuable resource for researchers and analysts in the development field. The surveys are both cross-sectional and panel, and currently run from 1992 through 2005; more surveys will be added to the database as they become available. While the RIGA project focuses mainly on the analysis of rural issues, the dataset contains information on both urban and rural income sources.

Find out more about the RIGA project: http://www.fao.org/es/ESA/riga/

Learn how to access the data:http://www.fao.org/es/ESA/riga/english/form_en.htm

Access the RIGA project publications:http://www.fao.org/es/ESA/riga/english/pubs_en.htm


Jameel Poverty Action Lab (J-PAL)

Descriptions of evaluations conducted by the Abdul Latif Jameel Poverty Action Lab are available from the J-PAL evaluations page. Data underlying these evaluations are available from the data pages at Harvard Dataverse.


The International Food Policy Reseach Institute

IFPRI has conducted several very innovative surveys in African and Asian countries. Many of these surveys are available for research purposes. See their home page and click on datasets.


Townsend Thai Project

and associated Thai databases are described here. The Townsend Thai project began in 1997 with a relatively large cross-section survey. Annual resurveys have been conducted and a monthly survey was initiated in August 1998.


Agricultural Innovation and Resource Management in Ghana

This is an integrated longitudinal farm production and consumption survey conducted by Christopher Udry and Markus Goldstein (Yale University). Data may be downloaded from here.


Social Networks Project (Kenya and Malawi)

collects longitudinal socio-demographic data in Kenya and Malawi under the direction of Susan Watkins and Jere Behrman. Data are available for downloading here.


South African National Income Dynamics Study (NIDS)

NIDS is a nationally representative panel study that examines income, consumption and expenditure of households over time in South. Africa. The baseline survey was conducted in 2008 and the first follow-up was conducted in 2010. The data will throw light on matters such as coping strategies deployed in response to shocks and unexpected events whether negative or positive, such as death in the family or an unemployed relative obtaining a job.
In addition to income and expenditure dynamics, study themes include the determinants of changes in poverty and well-being; household composition and structure; fertility and mortality; migrancy and migrant strategies; labour market participation and economic activity; human capital formation, health and education; vulnerability and social capital. See the NIDS web page for details.


SALDRU Langeberg Survey, South Africa

Langeberge integrated household survey was conducted by a consortium of South African and American universities along with government and non government agencies in South Africa. Data may be requested by sending an email. See their web page web page for details.


Survey on the Status of Women and Fertility

in five Asian countries collected detailed information on the status of women and their husbands in conjunction with fertility choices. Data collected in Malaysia, Pakistan, Philippines and Thailand in 1993/1994 are available for downloading here.


Center for Data Sharing

is housed at the Economic Growth Center at Yale University and distributes

Bicol longitudinal surveys, Philippines
ICRISAT India village level study
ICRISAT Burkina Faso farm production survey


Mexican Migration Project.

Professor Doug Massey and collaborators have collected several waves of surveys on migration from central Mexico with special sub-samples of Mexicans living in Chicago. The data can be obtained from the MMP. web-site of by contacting Kristin Espinosa at the University of Pennsylvania. Her e-mail address is espinosa@pop.upenn.edu.


Latin American Migration Project.

is an extension of the MMP. Mexican Migration Project . The project is directed by Professor Doug Massey who, with his collaborators, has collected data in Puerto Rico, the Dominican Republic, Nicaragua, Costa Rica and Peru. Data are available here.


Central American Population Project

collects fertility and health surveys carried out in Central America. Data from Belize, Guatemala, El Salvador, Honduras, Nicaragua, Costa Rica and Panama are included in the collection.


Tsimane Amazonian Panel Study (TAPS)

TAPS is an annual panel data set covering the period 2002 throuh 2006 that follows a native Amazonian horticultural and foraging society experiencing rapid integration to the rest of the world. The study has been tracking about 1,500 native Amazonians in about 250 households of 13 villages along the Maniqui River, Department of Beni, Bolivia, and has introduced agricultural development projects. TAPS surveys take place every year during June-August. The first five-years of data, 2002-2006, are now available to the public in STATA. To request access to the 2002-2006 panel data set and its documentation go to the following web site:http://people.brandeis.edu/~rgodoy/research/pgs/panel.html or contact Ricardo Godoy (781) 736-2784, rgodoy@brandeis.edu


World Fertility Surveys

The World Fertility Surveys (WFS) were conducted in 41 countries during the 1970s and early 1980s. The data are all in the public domain and available at the Office of Population Research at Princeton University . This is a very good site to find out about data on fertility including the Chinese In-Depth Fertility Surveys.

Countries for which World Fertility Surveys are available include:

Africa

BeninCameroon, 1978Cote d’Ivoire, 1980-81Egypt, 1980Ghana, 1979-80Kenya, 1977-1978; Lesotho, 1977Mauritania, 1981Morocco, 1980Nigeria, 1981-82Rwanda, 1983Senegal, 1978Sudan (North), 1978-79Tunisia, 1978;

Americas

Colombia, 1976Costa Rica, 1976Dominican Republic, 1975 and 1980Ecuador, 1979-80Guyana, 1975Haiti, 1977Jamaica, 1975-76Mexico, 1976-77Panama, 1975-76Paraguay, 1979Peru, 1977-78Trinidad & Tobago, 1977Venezuela, 1977;

Asia

Bangladesh, 1975-76Fiji, 1974Indonesia, 1976Jordan, 1976Korea, Republic of, 1974Malaysia, 1974Nepal, 1976Pakistan, 1975Philippines, 1978Sri Lanka, 1975Syria, 1978Thailand, 1975Turkey, 1978Yemen Arab Republic, 1979;

Europe

Portugal, 1979-80;


Demographic and Health Surveys

More recent fertility, mortality and health data are available from Demographic and Health Surveys (DHS) . National which is DHS has been collecting national sample surveys of population and maternal and child health conducted in many developing countries since the 1980s. Data are currently collected under the umbrella of the Measure project which is administered by Macro International. Data have been collected in four waves:

  • DHS-I (1986-90)
  • DHS-II (1991-1992)
  • DHS-III (1993-1997)
  • Measure (1998-present))

See the Measure DHS website for a list of countries that have been surveyed.


International Reproductive Health Surveys

The Centers for Disease Control (CDC) assists countries throughout the world in the development, implementation and analysis of national reproductive health surveys.


Africa Household Survey Project

Provides a listing of many household surveys conducted across Africa.


Sticerd (LSE) Fieldwork web-page

which is managed by Markus Goldstein provides links to additional surveys, questionnaires and survey methods materials. Return to Top of Page,


Firm level sources


African manufacturing sector

Firm level data collected by The World Bank in collaboration with the Centre for the Study of African Economies, Oxford University, and several Government Statistical Agencies may be downloaded from this site.


Centre for the Study of African Economies

CSAE faculty have collected firm level data in several African countries. Data from Ghana, Ethiopia, Tanzania and also, from a comparative study, in Cameroon, Ghana, Kenya and Zimbabwe. are available from the CSAE web-site . Some of these data are also available on the World Bank web site.