IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v184y2021i4p1161-1175.html
   My bibliography  Save this article

Enhancing (publications on) data quality: Deeper data minding and fuller data confession

Author

Listed:
  • Xiao‐Li Meng

Abstract

Statistics typically treats data as inputs for analysis, whereas the broader data science enterprise deals with the entire data life cycle, including the phases that output data. This commentary argues that it would benefit statistics and (data) science if we statisticians were also to treat data as products in and of themselves, and accordingly subject them to data minding, a stringent quality inspection process that scrutinizes data conceptualization, data pre‐processing, data curation and data provenance, in addition to data collection, the traditional objective of our emphasis before data analysis. A concrete step in promoting deeper data minding is to encourage fuller data confession in (statistical) publications, that is, to entice—or at least not to disincentivize—the authors into providing more details on the genealogy of a given body of data, including an account of its deliberations, especially with respect to sources of adverse influence on data quality. The collection of articles in this special issue (on data science for societies) provides both the inspiration and aspiration for deeper data minding and fuller data confession.

Suggested Citation

  • Xiao‐Li Meng, 2021. "Enhancing (publications on) data quality: Deeper data minding and fuller data confession," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1161-1175, October.
  • Handle: RePEc:bla:jorssa:v:184:y:2021:i:4:p:1161-1175
    DOI: 10.1111/rssa.12762
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12762
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12762?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Angelika Geroldinger & Milan Hronsky & Florian Endel & Gottfried Endel & Rainer Oberbauer & Georg Heinze, 2021. "Estimation of the prevalence of chronic kidney disease in people with diabetes by combining information from multiple routine data collections," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1260-1282, October.
    2. Rosenbaum, Paul R., 2010. "Design Sensitivity and Efficiency in Observational Studies," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 692-702.
    3. Jingxian You & Paul Expert & Céire Costelloe, 2021. "Using text mining to track outbreak trends in global surveillance of emerging diseases: ProMED‐mail," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1245-1259, October.
    4. Pierre Masselot & Fateh Chebana & Céline Campagna & Éric Lavigne & Taha B.M.J. Ouarda & Pierre Gosselin, 2021. "Machine learning approaches to identify thresholds in a heat‐health warning system context," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1326-1346, October.
    5. Seppo Virtanen & Mark Girolami, 2021. "Spatio‐temporal mixed membership models for criminal activity," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1220-1244, October.
    6. Matteo Iacopini & Carlo R.M.A. Santagiustina, 2021. "Filtering the intensity of public concern from social media count data with jumps," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1283-1302, October.
    7. Marvin N. Wright & Sasmita Kusumastuti & Laust H. Mortensen & Rudi G. J. Westendorp & Thomas A. Gerds, 2021. "Personalised need of care in an ageing society: The making of a prediction tool based on register data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1199-1219, October.
    8. David Bolin & Vilhelm Verendel & Meta Berghauser Pont & Ioanna Stavroulaki & Oscar Ivarsson & Erik Håkansson, 2021. "Functional ANOVA modelling of pedestrian counts on streets in three European cities," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1176-1198, October.
    9. S. O. Tickle & I. A. Eckley & P. Fearnhead, 2021. "A computationally efficient, high‐dimensional multiple changepoint procedure with application to global terrorism incidence," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1303-1325, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Harrison, Ann E. & Lin, Justin Yifu & Xu, Lixin Colin, 2014. "Explaining Africa’s (Dis)advantage," World Development, Elsevier, vol. 63(C), pages 59-77.
    2. Anand Acharya & Lynda Khalaf & Marcel Voia & Myra Yazbeck & David Wensley, 2021. "Severity of Illness and the Duration of Intensive Care," Working Papers 2021-003, Human Capital and Economic Opportunity Working Group.
    3. Reed, Deborah K. & Aloe, Ariel M., 2020. "Interpreting the effectiveness of a summer reading program: The eye of the beholder," Evaluation and Program Planning, Elsevier, vol. 83(C).
    4. Gianluca Anese & Marco Corazza & Michele Costola & Loriana Pelizzon, 2023. "Impact of public news sentiment on stock market index return and volatility," Computational Management Science, Springer, vol. 20(1), pages 1-36, December.
    5. Nordjo, R. & Adjasi, C., 2018. "The Impact of Finance on Welfare of Smallholder Farm Household in Ghana," 2018 Conference, July 28-August 2, 2018, Vancouver, British Columbia 277142, International Association of Agricultural Economists.
    6. Hirschauer, Norbert & Grüner, Sven & Mußhoff, Oliver & Becker, Claudia & Jantsch, Antje, 2020. "Can p-values be meaningfully interpreted without random sampling?," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 14, pages 71-91.
    7. König, Pascal D. & Wenzelburger, Georg, 2021. "The legitimacy gap of algorithmic decision-making in the public sector: Why it arises and how to address it," Technology in Society, Elsevier, vol. 67(C).
    8. Hao Dong & Daniel L. Millimet, 2020. "Propensity Score Weighting with Mismeasured Covariates: An Application to Two Financial Literacy Interventions," JRFM, MDPI, vol. 13(11), pages 1-24, November.
    9. Javier Gardeazabal & Todd Sandler, 2015. "INTERPOL's Surveillance Network in Curbing Transnational Terrorism," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 34(4), pages 761-780, September.
    10. Ting Ye & Ashkan Ertefaie & James Flory & Sean Hennessy & Dylan S. Small, 2023. "Instrumented difference‐in‐differences," Biometrics, The International Biometric Society, vol. 79(2), pages 569-581, June.
    11. Ian Gazeley & Rose Holmes & Andrew Newell & Kevin Reynolds & Hector Gutierrez Rufrancos, 2023. "Escaping from hunger before WW1: the nutritional transition and living standards in Western Europe and USA in the late nineteenth century," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 17(3), pages 533-565, September.
    12. Kim, Ja Young & Bartholomew, Keith & Ewing, Reid, 2020. "Another one rides the bus? The connections between bus stop amenities, bus ridership, and ADA paratransit demand," Transportation Research Part A: Policy and Practice, Elsevier, vol. 135(C), pages 280-288.
    13. Kuha, Jouni & Sturgis, Patrick, 2016. "Comment on ‘what to do instead of significance testing? Calculating the “number of counterfactual cases needed to disturb a finding”’ by Stephen Gorard and Jonathan Gorard," LSE Research Online Documents on Economics 66035, London School of Economics and Political Science, LSE Library.
    14. Margitta Minah, 2022. "What is the influence of government programs on farmer organizations and their impacts? Evidence from Zambia," Annals of Public and Cooperative Economics, Wiley Blackwell, vol. 93(1), pages 29-53, March.
    15. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    16. Pongpitch Amatyakul & Panchanok Jumrustanasan & Pornchanok Tapkham, 2023. "What can 20 billion financial transactions tell us about the impacts of Covid-19 fiscal transfers?," BIS Working Papers 1130, Bank for International Settlements.
    17. Zhexiao Lin & Peng Ding & Fang Han, 2023. "Estimation Based on Nearest Neighbor Matching: From Density Ratio to Average Treatment Effect," Econometrica, Econometric Society, vol. 91(6), pages 2187-2217, November.
    18. Siyu Heng & Dylan S. Small & Paul R. Rosenbaum, 2020. "Finding the strength in a weak instrument in a study of cognitive outcomes produced by Catholic high schools," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 935-958, June.
    19. Daniel Gama e Colombo, 2016. "Impact Assessment of Tax Incentives to Foster Industrial Innovation in Brazil: The Case of Law 11,196/05," Working Papers, Department of Economics 2016_30, University of São Paulo (FEA-USP).
    20. Bikram Karmakar, 2022. "An approximation algorithm for blocking of an experimental design," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1726-1750, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:184:y:2021:i:4:p:1161-1175. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.