IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v181y2018i3p555-605.html
   My bibliography  Save this article

Statistical challenges of administrative and transaction data

Author

Listed:
  • David J. Hand

Abstract

Administrative data are becoming increasingly important. They are typically the side effect of some operational exercise and are often seen as having significant advantages over alternative sources of data. Although it is true that such data have merits, statisticians should approach the analysis of such data with the same cautious and critical eye as they approach the analysis of data from any other source. The paper identifies some statistical challenges, with the aim of stimulating debate about and improving the analysis of administrative data, and encouraging methodology researchers to explore some of the important statistical problems which arise with such data.

Suggested Citation

  • David J. Hand, 2018. "Statistical challenges of administrative and transaction data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 555-605, June.
  • Handle: RePEc:bla:jorssa:v:181:y:2018:i:3:p:555-605
    DOI: 10.1111/rssa.12315
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12315
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12315?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Giannone, Domenico & Reichlin, Lucrezia & Small, David, 2008. "Nowcasting: The real-time informational content of macroeconomic data," Journal of Monetary Economics, Elsevier, vol. 55(4), pages 665-676, May.
    2. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    3. John P A Ioannidis, 2005. "Why Most Published Research Findings Are False," PLOS Medicine, Public Library of Science, vol. 2(8), pages 1-1, August.
    4. Christopher Berka & Stefan Humer & Mathias Moser & Manuela Lenk & Henrik Rechta & Eliane Schwerer, 2012. "Combination of evidence from multiple administrative data sources: quality assessment of the Austrian register‐based Census 2011," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(1), pages 18-33, February.
    5. Reiter, Jerome P., 2005. "Estimating Risks of Identification Disclosure in Microdata," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1103-1112, December.
    6. Angela Luna & Li-Chun Zhang & Alison Whitworth & Kirsten Piller, 2015. "Small Area Estimates Of The Population Distribution By Ethnic Group In England: A Proposal Using Structure Preserving Estimators," Statistics in Transition New Series, Polish Statistical Association, vol. 16(4), pages 585-602, December.
    7. Claudio Agostinelli & Andy Leung & Victor Yohai & Ruben Zamar, 2015. "Rejoinder on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 484-488, September.
    8. P. Lahiri & Michael D. Larsen, 2005. "Regression Analysis With Linked Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 222-230, March.
    9. Andrew Harvey & Chia‐Hui Chung, 2000. "Estimating the underlying change in unemployment in the UK," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 163(3), pages 303-309.
    10. Jelke Bethlehem, 2010. "Selection Bias in Web Surveys," International Statistical Review, International Statistical Institute, vol. 78(2), pages 161-188, August.
    11. Ross Meader & Geoff Tily, 2008. "Monitoring the quality of the National Accounts," Economic & Labour Market Review, Palgrave Macmillan;Office for National Statistics, vol. 2(3), pages 24-33, March.
    12. Ben Powell & Guy Nason & Duncan Elliott & Matthew Mayhew & Jennifer Davies & Joe Winton, 2018. "Tracking and modelling prices using web‐scraped price microdata: towards automated daily consumer price index forecasting," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 737-756, June.
    13. Domenico Giannone & Lucrezia Reichlin & David Small, 2008. "Nowcasting: the real time informational content of macroeconomic data releases," ULB Institutional Repository 2013/6409, ULB -- Universite Libre de Bruxelles.
    14. Reichlin, Lucrezia & Giannone, Domenico & Small, David, 2005. "Nowcasting GDP and Inflation: The Real Time Informational Content of Macroeconomic Data Releases," CEPR Discussion Papers 5178, C.E.P.R. Discussion Papers.
    15. Cavallo, Alberto, 2013. "Online and official price indexes: Measuring Argentina's inflation," Journal of Monetary Economics, Elsevier, vol. 60(2), pages 152-165.
    16. Jan A. Brakel & Sabine Krieg, 2016. "Small area estimation with state space common factor models for rotating panels," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(3), pages 763-791, June.
    17. Charles F. Manski, 2014. "Communicating Uncertainty in Official Economic Statistics," NBER Working Papers 20098, National Bureau of Economic Research, Inc.
    18. Rastislav Potocký & Helmut Waldl & Milan Stehlík, 2014. "On Sums of Claims and their Applications in Analysis of Pension Funds and Insurance Products," Prague Economic Papers, Prague University of Economics and Business, vol. 2014(3), pages 349-370.
    19. Alberto Cavallo & Roberto Rigobon, 2016. "The Billion Prices Project: Using Online Prices for Measurement and Research," Journal of Economic Perspectives, American Economic Association, vol. 30(2), pages 151-178, Spring.
    20. Claudio Agostinelli & Andy Leung & Victor Yohai & Ruben Zamar, 2015. "Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 441-461, September.
    21. Jan Beran & Dieter Schell & Milan Stehlík, 2014. "The harmonic moment tail index estimator: asymptotic distribution and robustness," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(1), pages 193-220, February.
    22. Li‐Chun Zhang, 2012. "Topics of statistical theory for register‐based statistics and data integration," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(1), pages 41-63, February.
    23. D. L. Oberski & A. Kirchner & S. Eckman & F. Kreuter, 2017. "Evaluating the Quality of Survey and Administrative Data with Generalized Multitrait-Multimethod Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1477-1489, October.
    24. Niels Keiding & Thomas A. Louis, 2016. "Perils and potentials of self-selected entry to epidemiological studies and surveys," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(2), pages 319-376, February.
    25. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    26. Crook, Jonathan & Banasik, John, 2004. "Does reject inference really improve the performance of application scoring models?," Journal of Banking & Finance, Elsevier, vol. 28(4), pages 857-874, April.
    27. Alison Whitworth & Kirsten Piller & Angela Luna & Li-Chun Zhang, 2015. "Small area estimates of the population distribution by ethnic group in England: a proposal using structure preserving estimators," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 16(4), pages 585-602, December.
    28. Karr, A.F. & Kohnen, C.N. & Oganian, A. & Reiter, J.P. & Sanil, A.P., 2006. "A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality," The American Statistician, American Statistical Association, vol. 60, pages 224-232, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gričar Sergej & Baldigara Tea, 2019. "An explorative study of tourism time series: Evidence from Slovenia and Croatia," Croatian Review of Economic, Business and Social Statistics, Sciendo, vol. 5(2), pages 101-116, December.
    2. Paul Labonne & Martin Weale, 2020. "Temporal disaggregation of overlapping noisy quarterly data: estimation of monthly output from UK value‐added tax data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 1211-1230, June.
    3. Dani Arribas-Bel & Mark Green & Francisco Rowe & Alex Singleton, 2021. "Open data products-A framework for creating valuable analysis ready data," Journal of Geographical Systems, Springer, vol. 23(4), pages 497-514, October.
    4. Justin T. van Dijk & Guy Lansley & Paul A. Longley, 2021. "Using linked consumer registers to estimate residential moves in the United Kingdom," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1452-1474, October.
    5. Serena Pattaro & Nick Bailey & Chris Dibben, 2020. "Using Linked Longitudinal Administrative Data to Identify Social Disadvantage," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 147(3), pages 865-895, February.
    6. Ana Beatriz Galvão & James Mitchell, 2023. "Real‐Time Perceptions of Historical GDP Data Uncertainty," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(3), pages 457-481, June.
    7. Jonas F. Schenkel & Li‐Chun Zhang, 2022. "Adjusting misclassification using a second classifier with an external validation sample," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1882-1902, October.
    8. Stephanie Coffey, PhD. & Jaya Damineni & John Eltinge, PhD. & Anup Mathur, PhD. & Kayla Varela & Allison Zotti, 2023. "Some Open Questions on Multiple-Source Extensions of Adaptive-Survey Design Concepts and Methods," Working Papers 23-03, Center for Economic Studies, U.S. Census Bureau.
    9. Jae‐Kwang Kim & Siu‐Ming Tam, 2021. "Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference," International Statistical Review, International Statistical Institute, vol. 89(2), pages 382-401, August.
    10. Marušić Zrinka & Kožul Marijana & Brozović Ivana, 2020. "Measuring non-commercial tourism traffic in Croatia: Challenges of using administrative data," Croatian Review of Economic, Business and Social Statistics, Sciendo, vol. 6(2), pages 69-81, December.
    11. Jamie C. Moore & Gabriele B. Durrant & Peter W. F. Smith, 2021. "Do coefficients of variation of response propensities approximate non‐response biases during survey data collection?," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 301-323, January.
    12. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    13. Peter G. M. van der Heijden & Maarten Cruyff & Paul A. Smith & Christine Bycroft & Patrick Graham & Nathaniel Matheson‐Dunning, 2022. "Multiple system estimation using covariates having missing values and measurement error: Estimating the size of the Māori population in New Zealand," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 156-177, January.
    14. Teresa Duarte Martinho, 2018. "Researching Culture through Big Data: Computational Engineering and the Human and Social Sciences," Social Sciences, MDPI, vol. 7(12), pages 1-17, December.
    15. Lothian Jack & Holmberg Anders & Seyb Allyson, 2019. "An Evolutionary Schema for Using “it-is-what-it-is” Data in Official Statistics," Journal of Official Statistics, Sciendo, vol. 35(1), pages 137-165, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Philip ME Garboden, 2019. "Sources and Types of Big Data for Macroeconomic Forecasting," Working Papers 2019-3, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    2. Resce, Giuliano & Maynard, Diana, 2018. "What matters most to people around the world? Retrieving Better Life Index priorities on Twitter," Technological Forecasting and Social Change, Elsevier, vol. 137(C), pages 61-75.
    3. Yan Leng & Nakash Ali Babwany & Alex Pentland, 2021. "Unraveling the association between socioeconomic diversity and consumer price index in a tourism country," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-10, December.
    4. Macias, Paweł & Stelmasiak, Damian & Szafranek, Karol, 2023. "Nowcasting food inflation with a massive amount of online prices," International Journal of Forecasting, Elsevier, vol. 39(2), pages 809-826.
    5. Jianqing Fan & Kunpeng Li & Yuan Liao, 2020. "Recent Developments on Factor Models and its Applications in Econometric Learning," Papers 2009.10103, arXiv.org.
    6. Barış Soybilgen & M. Ege Yazgan & Hüseyin Kaya, 2023. "Nowcasting Turkish Food Inflation Using Daily Online Prices," Journal of Business Cycle Research, Springer;Centre for International Research on Economic Tendency Surveys (CIRET), vol. 19(2), pages 171-190, September.
    7. Lahiri, Kajal & Monokroussos, George & Zhao, Yongchen, 2013. "The yield spread puzzle and the information content of SPF forecasts," Economics Letters, Elsevier, vol. 118(1), pages 219-221.
    8. Máximo Camacho & Rafael Doménech, 2012. "MICA-BBVA: a factor model of economic and financial indicators for short-term GDP forecasting," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 3(4), pages 475-497, December.
    9. Brave, Scott A. & Gascon, Charles & Kluender, William & Walstrum, Thomas, 2021. "Predicting benchmarked US state employment data in real time," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1261-1275.
    10. Andrea Carriero & Todd E. Clark & Massimiliano Marcellino, 2015. "Realtime nowcasting with a Bayesian mixed frequency model with stochastic volatility," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 178(4), pages 837-862, October.
    11. Libero Monteforte & Valentina Raponi, 2019. "Short‐term forecasts of economic activity: Are fortnightly factors useful?," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 38(3), pages 207-221, April.
    12. Aastveit, Knut Are & Jore, Anne Sofie & Ravazzolo, Francesco, 2016. "Identification and real-time forecasting of Norwegian business cycles," International Journal of Forecasting, Elsevier, vol. 32(2), pages 283-292.
    13. Richard K. Crump & Stefano Eusepi & Domenico Giannone & Eric Qian & Argia M. Sbordone, 2021. "A Large Bayesian VAR of the United States Economy," Staff Reports 976, Federal Reserve Bank of New York.
    14. David Havrlant & Peter Tóth & Julia Wörz, 2016. "On the optimal number of indicators – nowcasting GDP growth in CESEE," Focus on European Economic Integration, Oesterreichische Nationalbank (Austrian Central Bank), issue 4, pages 54-72.
    15. Claudia Foroni & Massimiliano Marcellino, 2013. "A survey of econometric methods for mixed-frequency data," Economics Working Papers ECO2013/02, European University Institute.
    16. Ferrari, Davide & Ravazzolo, Francesco & Vespignani, Joaquin, 2021. "Forecasting energy commodity prices: A large global dataset sparse approach," Energy Economics, Elsevier, vol. 98(C).
    17. Aastveit, Knut Are & Trovik, Tørres, 2014. "Estimating the output gap in real time: A factor model approach," The Quarterly Review of Economics and Finance, Elsevier, vol. 54(2), pages 180-193.
    18. Cepni, Oguzhan & Gul, Selcuk & Gupta, Rangan, 2020. "Local currency bond risk premia of emerging markets: The role of local and global factors," Finance Research Letters, Elsevier, vol. 33(C).
    19. Cahan, Ercument & Bai, Jushan & Ng, Serena, 2023. "Factor-based imputation of missing values and covariances in panel data of large dimensions," Journal of Econometrics, Elsevier, vol. 233(1), pages 113-131.
    20. Aleksandra Riedl & Julia Wörz, 2018. "A simple approach to nowcasting GDP growth in CESEE economies," Focus on European Economic Integration, Oesterreichische Nationalbank (Austrian Central Bank), issue Q4/18, pages 56-74.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:181:y:2018:i:3:p:555-605. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.