IDEAS home Printed from https://ideas.repec.org/a/sae/anname/v659y2015i1p260-273.html
   My bibliography  Save this article

Automating Open Science for Big Data

Author

Listed:
  • Mercè Crosas
  • Gary King
  • James Honaker
  • Latanya Sweeney

Abstract

The vast majority of social science research uses small (megabyte- or gigabyte-scale) datasets. These fixed-scale datasets are commonly downloaded to the researcher’s computer where the analysis is performed. The data can be shared, archived, and cited with well-established technologies, such as the Dataverse Project, to support the published results. The trend toward big data—including large-scale streaming data—is starting to transform research and has the potential to impact policymaking as well as our understanding of the social, economic, and political problems that affect human societies. However, big data research poses new challenges to the execution of the analysis, archiving and reuse of the data, and reproduction of the results. Downloading these datasets to a researcher’s computer is impractical, leading to analyses taking place in the cloud, and requiring unusual expertise, collaboration, and tool development. The increased amount of information in these large datasets is an advantage, but at the same time it poses an increased risk of revealing personally identifiable sensitive information. In this article, we discuss solutions to these new challenges so that the social sciences can realize the potential of big data.

Suggested Citation

  • Mercè Crosas & Gary King & James Honaker & Latanya Sweeney, 2015. "Automating Open Science for Big Data," The ANNALS of the American Academy of Political and Social Science, , vol. 659(1), pages 260-273, May.
  • Handle: RePEc:sae:anname:v:659:y:2015:i:1:p:260-273
    DOI: 10.1177/0002716215570847
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0002716215570847
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0002716215570847?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. King, Gary & Honaker, James & Joseph, Anne & Scheve, Kenneth, 2001. "Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation," American Political Science Review, Cambridge University Press, vol. 95(1), pages 49-69, March.
    2. James Honaker & Gary King, 2010. "What to Do about Missing Values in Time‐Series Cross‐Section Data," American Journal of Political Science, John Wiley & Sons, vol. 54(2), pages 561-581, April.
    3. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    4. Ariel Kleiner & Ameet Talwalkar & Purnamrita Sarkar & Michael I. Jordan, 2014. "A scalable bootstrap for massive data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 795-816, September.
    5. Gary King, 2007. "An Introduction to the Dataverse Network as an Infrastructure for Data Sharing," Sociological Methods & Research, , vol. 36(2), pages 173-199, November.
    6. Ho, Daniel E. & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2007. "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference," Political Analysis, Cambridge University Press, vol. 15(3), pages 199-236, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Talebian, Ahmadreza & Zou, Bo & Hansen, Mark, 2018. "Assessing the impacts of state-supported rail services on local population and employment: A California case study," Transport Policy, Elsevier, vol. 63(C), pages 108-121.
    2. Iacus, Stefano & King, Gary & Porro, Giuseppe, 2009. "cem: Software for Coarsened Exact Matching," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i09).
    3. Caccavale, Oscar Maria & Giuffrida, Valerio, 2020. "The Proteus composite index: Towards a better metric for global food security," World Development, Elsevier, vol. 126(C).
    4. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24527, University Library of Munich, Germany.
    5. Shige Song, 2013. "Prenatal malnutrition and subsequent foetal loss risk: Evidence from the 1959-1961 Chinese famine," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 29(26), pages 707-728.
    6. Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
    7. Baltagi, Badi H. & Bresson, Georges & Chaturvedi, Anoop & Lacroix, Guy, 2022. "Robust Dynamic Space-Time Panel Data Models Using ?-Contamination: An Application to Crop Yields and Climate Change," IZA Discussion Papers 15815, Institute of Labor Economics (IZA).
    8. Seiler, Christian & Heumann, Christian, 2013. "Microdata imputations and macrodata implications: Evidence from the Ifo Business Survey," Economic Modelling, Elsevier, vol. 35(C), pages 722-733.
    9. Ihle, Rico & Rubin, Ofir D., 2012. "Price Transmission Subject to Security‐based Trade Barriers in the Context of the Israeli‐Palestinian Conflict," 2012 Conference, August 18-24, 2012, Foz do Iguacu, Brazil 125392, International Association of Agricultural Economists.
    10. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24399, University Library of Munich, Germany.
    11. Roman Matkovskyy, 2016. "A comparison of pre- and post-crisis efficiency of OECD countries: evidence from a model with temporal heterogeneity in time and unobservable individual effect," European Journal of Comparative Economics, Cattaneo University (LIUC), vol. 13(2), pages 135-167, December.
    12. Catherine Norman, 2009. "Rule of Law and the Resource Curse: Abundance Versus Intensity," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 43(2), pages 183-207, June.
    13. Ann Bostrom & Adam L. Hayes & Katherine M. Crosman, 2019. "Efficacy, Action, and Support for Reducing Climate Change Risks," Risk Analysis, John Wiley & Sons, vol. 39(4), pages 805-828, April.
    14. Ouedraogo, Idrissa & Ngoa Tabi, Henri & Atangana Ondoa, Henri & Jiya, Alex Nester, 2022. "Institutional quality and human capital development in Africa," Economic Systems, Elsevier, vol. 46(1).
    15. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    16. Adam S. Chilton & Mila Versteeg, 2015. "The Failure of Constitutional Torture Prohibitions," The Journal of Legal Studies, University of Chicago Press, vol. 44(2), pages 417-452.
    17. Eriko Miyama & Shunsuke Managi, 2014. "Global environmental emissions estimate: application of multiple imputation," Environmental Economics and Policy Studies, Springer;Society for Environmental Economics and Policy Studies - SEEPS, vol. 16(2), pages 115-135, April.
    18. Reinsberg, Bernhard, 2015. "Foreign Aid Responses to Political Liberalization," World Development, Elsevier, vol. 75(C), pages 46-61.
    19. World Bank & Organisation for Economic Co-operation and Development, 2017. "A Step Ahead," World Bank Publications - Books, The World Bank Group, number 27527, December.
    20. Kenneth Scheve, 2003. "Public demand for low inflation," Bank of England working papers 172, Bank of England.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:anname:v:659:y:2015:i:1:p:260-273. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.