|MIT Department: Abdul Latif Jameel Poverty Action Lab
Faculty Mentor: Prof. Benjamin Olken
Undergraduate Institution: The University of Texas at Austin
I am a rising senior from Mexico, majoring in economics and mathematics at the University of Texas at Austin. I am passionate about development, education, and public policy evaluation, and I have a strong sense of social responsibility. One of my main career goals is conducting policy-relevant research that could improve social mobility in Latin America. Before pursuing a Ph.D. in economics, I want to join a predoctoral fellowship program or work at a research center to be better prepared as a researcher. Some of my hobbies include doing zumba, watching anime, hanging out with my friends, and playing basketball!
What Makes J-PAL RCTs Datasets Popular?
Ximena Mercado Garcia1, Sarah Kopper2 and Jack Cavanagh2
1Department of Economics and Mathematics, The University of Texas at Austin
2Abdul Latif Jameel Poverty Action Lab, Massachusetts Institute of Technology
Data sharing can be beneficial for the research community, as it allows for data re-use, can help answer questions on external validity and generalizability, and enables the replication and confirmation of results. The Abdul Latif Jameel Poverty Action Lab (J-PAL), a global research center working to reduce poverty by ensuring that policy is informed by scientific evidence, archives datasets and code from randomized controlled trials in the J-PAL Dataverse, its free data repository. J-PAL collects data on downloaders with the purpose of understanding their progress on their goals of democratizing data access and encouraging research transparency and replicability. In order to increase the number of downloads and encourage data publication, we want to understand what makes a dataset popular among downloaders. To do so, we compare the distribution of characteristics of popular datasets with those of all datasets. These characteristics include the income level of the downloaders’ countries, general position of the downloaders, intended use of the datasets, and the sector and region of the research projects. We found that there are no major differences among popular and non-popular datasets. For both, health is the most prominent sector, while South Asia is the region of the projects with more downloads. Likewise, the downloader profiles of popular and non-popular datasets are similar: most datasets in each group are downloaded by graduate students and residents of high-income countries, and are intended for exploratory and replication purposes. These results rule out these features as drivers of popularity, but better data collection and more analysis, such as on dataset documentation, are needed to determine the reasons for download popularity. Understanding this can improve J-PAL’s goal of making research data from randomized evaluations in the social sciences widely available and accessible.