IDEAS home Printed from https://ideas.repec.org/p/pri/crcwel/wp18-10-ff.html
   My bibliography  Save this paper

Improving metadata infrastructure for complex surveys: 
Insights from the Fragile Families Challenge

Author

Listed:
  • Alexander Kindel

    (Princeton University)

  • Vineet Bansal

    (Princeton University)

  • Kristin Catena

    (Princeton University)

  • Thomas Hartshorne

    (Princeton University)

  • Kate Jaeger

    (Princeton University)

Abstract

Researchers rely on metadata systems to prepare data for analysis. As the complexity of datasets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study based on the experiences of participants in the Fragile Families Challenge. We demonstrate how treating metadata as data—that is, releasing comprehensive information about variables in a format amenable to both automated and manual processing—can make the task of data preparation less arduous and less error-prone for all types of data analysis. We hope that our work will facilitate new applications of machine learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. We have open-sourced the tools we created so that others can use and improve them.

Suggested Citation

  • Alexander Kindel & Vineet Bansal & Kristin Catena & Thomas Hartshorne & Kate Jaeger, 2018. "Improving metadata infrastructure for complex surveys: 
Insights from the Fragile Families Challenge," Working Papers wp18-10-ff, Princeton University, School of Public and International Affairs, Center for Research on Child Wellbeing..
  • Handle: RePEc:pri:crcwel:wp18-10-ff
    as

    Download full text from publisher

    File URL: https://fragilefamilies.princeton.edu/sites/fragilefamilies/files/wp18-10-ff.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jeremy Freese, 2007. "Replication Standards for Quantitative Social Science," Sociological Methods & Research, , vol. 36(2), pages 153-172, November.
    2. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    3. Reichman, Nancy E. & Teitler, Julien O. & Garfinkel, Irwin & McLanahan, Sara S., 2001. "Fragile Families: sample and design," Children and Youth Services Review, Elsevier, vol. 23(4-5), pages 303-326.
    4. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    5. Sandrah Eckel & Roger Peng, 2009. "Interacting with local and remote data repositories using the stashR package," Computational Statistics, Springer, vol. 24(2), pages 247-254, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Watts, Duncan J & Beck, Emorie D & Bienenstock, Elisa Jayne & Bowers, Jake & Frank, Aaron & Grubesic, Anthony & Hofman, Jake M. & Rohrer, Julia Marie & Salganik, Matthew, 2018. "Explanation, prediction, and causality: Three sides of the same coin?," OSF Preprints u6vz5, Center for Open Science.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kindel, Alexander & Bansal, Vineet & Catena, Kristin & Hartshorne, Thomas & Jaeger, Kate & Koffman, Dawn & McLanahan, Sara & Phillips, Maya & Rouhani, Shiva & Vinh, Ryan, 2018. "Improving metadata infrastructure for complex surveys: Insights from the Fragile Families Challenge," SocArXiv u8spj, Center for Open Science.
    2. Philippe Goulet Coulombe & Maxime Leroux & Dalibor Stevanovic & Stéphane Surprenant, 2022. "How is machine learning useful for macroeconomic forecasting?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(5), pages 920-964, August.
    3. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517, arXiv.org, revised Aug 2020.
    4. Croux, Christophe & Jagtiani, Julapa & Korivi, Tarunsai & Vulanovic, Milos, 2020. "Important factors determining Fintech loan default: Evidence from a lendingclub consumer platform," Journal of Economic Behavior & Organization, Elsevier, vol. 173(C), pages 270-296.
    5. Yucheng Yang & Zhong Zheng & Weinan E, 2020. "Interpretable Neural Networks for Panel Data Analysis in Economics," Papers 2010.05311, arXiv.org, revised Nov 2020.
    6. Galdo, Virgilio & Li, Yue & Rama, Martin, 2021. "Identifying urban areas by combining human judgment and machine learning: An application to India," Journal of Urban Economics, Elsevier, vol. 125(C).
    7. Stuart Gabriel & Matteo Iacoviello & Chandler Lutz, 2021. "A Crisis of Missed Opportunities? Foreclosure Costs and Mortgage Modification During the Great Recession [Synthetic control methods for comparative case studies: Estimating the effect of California," The Review of Financial Studies, Society for Financial Studies, vol. 34(2), pages 864-906.
    8. Michael Bailey & Drew Johnston & Theresa Kuchler & Johannes Stroebel & Arlene Wong, 2022. "Peer Effects in Product Adoption," American Economic Journal: Applied Economics, American Economic Association, vol. 14(3), pages 488-526, July.
    9. Gallin, Joshua & Molloy, Raven & Nielsen, Eric & Smith, Paul & Sommer, Kamila, 2021. "Measuring aggregate housing wealth: New insights from machine learning ☆," Journal of Housing Economics, Elsevier, vol. 51(C).
    10. Robertas Damasevicius, 2023. "Progress, Evolving Paradigms and Recent Trends in Economic Analysis," Financial Economics Letters, Anser Press, vol. 2(2), pages 35-47, October.
    11. Barkan, Oren & Benchimol, Jonathan & Caspi, Itamar & Cohen, Eliya & Hammer, Allon & Koenigstein, Noam, 2023. "Forecasting CPI inflation components with Hierarchical Recurrent Neural Networks," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1145-1162.
    12. Falco J. Bargagli-Dtoffi & Massimo Riccaboni & Armando Rungi, 2020. "Machine Learning for Zombie Hunting. Firms Failures and Financial Constraints," Working Papers 01/2020, IMT School for Advanced Studies Lucca, revised Jun 2020.
    13. Onder Ozgur & Erdal Tanas Karagol & Fatih Cemil Ozbugday, 2021. "Machine learning approach to drivers of bank lending: evidence from an emerging economy," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 7(1), pages 1-29, December.
    14. Byron Botha & Rulof Burger & Kevin Kotzé & Neil Rankin & Daan Steenkamp, 2023. "Big data forecasting of South African inflation," Empirical Economics, Springer, vol. 65(1), pages 149-188, July.
    15. Mathias Bärtl & Simone Krummaker, 2020. "Prediction of Claims in Export Credit Finance: A Comparison of Four Machine Learning Techniques," Risks, MDPI, vol. 8(1), pages 1-27, March.
    16. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    17. John Aoga & Juhee Bae & Stefanija Veljanoska & Siegfried Nijssen & Pierre Schaus, 2020. "Impact of weather factors on migration intention using machine learning algorithms," Papers 2012.02794, arXiv.org.
    18. Paolo Brunori & Vito Peragine & Laura Serlenga, 2019. "Upward and downward bias when measuring inequality of opportunity," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 52(4), pages 635-661, April.
    19. Andini, Monica & Boldrini, Michela & Ciani, Emanuele & de Blasio, Guido & D'Ignazio, Alessio & Paladini, Andrea, 2022. "Machine learning in the service of policy targeting: The case of public credit guarantees," Journal of Economic Behavior & Organization, Elsevier, vol. 198(C), pages 434-475.
    20. Helmut Wasserbacher & Martin Spindler, 2022. "Machine learning for financial forecasting, planning and analysis: recent developments and pitfalls," Digital Finance, Springer, vol. 4(1), pages 63-88, March.

    More about this item

    Keywords

    metadata; survey research; data sharing; quantitative methodology; computational social science;
    All these keywords.

    JEL classification:

    • F13 - International Economics - - Trade - - - Trade Policy; International Trade Organizations

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pri:crcwel:wp18-10-ff. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Bobray Bordelon (email available below). General contact details of provider: https://edirc.repec.org/data/ccprius.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.