IDEAS home Printed from https://ideas.repec.org/a/spr/astaws/v15y2021i1d10.1007_s11943-021-00280-5.html
   My bibliography  Save this article

Anwendungen des Web Scraping in der amtlichen Statistik
[Applications for web scraping in official statistics]

Author

Listed:
  • Heidi Kühnemann

    (Hessisches Statistisches Landesamt und Statistisches Bundesamt)

Abstract

Zusammenfassung Im World Wide Web (kurz „Web“) sind große Datenmengen verfügbar, die auch die amtliche Statistik für sich nutzbar machen kann. Die Extraktion dieser Daten durch Web Scraping bietet vielfältige Potenziale, beispielsweise die Kosten für die Datenerhebung reduzieren, Befragte entlasten, die Qualität amtlicher Daten verbessern oder stichprobenrelevante Einheiten in Befragungen identifizieren. Am Beispiel der Preis‑, Tourismus‑, Arbeitsmarkt- und Unternehmensstatistik wird in diesem Beitrag aufgezeigt, wie die amtliche Statistik in Deutschland bereits Web Scraping nutzt. Viele der hier aufgeführten Anwendungen befinden sich noch in einem frühen Entwicklungsstadium. In anderen nationalen Statistikämtern werden Daten aus dem Web zum Teil bereits in einem größeren Ausmaß für experimentelle Statistiken und im Produktivbetrieb genutzt. Dies ist unter anderem auf eine teils unzureichende rechtliche Grundlage von Web Scraping in der amtlichen Statistik in Deutschland, auf eine für die Methode nicht adäquate IT-Infrastruktur sowie auf einen Mangel an Mitarbeitenden mit den notwendigen Qualifikationen zurückzuführen.

Suggested Citation

  • Heidi Kühnemann, 2021. "Anwendungen des Web Scraping in der amtlichen Statistik [Applications for web scraping in official statistics]," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 15(1), pages 5-25, March.
  • Handle: RePEc:spr:astaws:v:15:y:2021:i:1:d:10.1007_s11943-021-00280-5
    DOI: 10.1007/s11943-021-00280-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11943-021-00280-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11943-021-00280-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Azar, José & Marinescu, Ioana & Steinbaum, Marshall & Taska, Bledi, 2020. "Concentration in US labor markets: Evidence from online vacancy data," Labour Economics, Elsevier, vol. 66(C).
    2. Matthew Gentzkow & Bryan Kelly & Matt Taddy, 2019. "Text as Data," Journal of Economic Literature, American Economic Association, vol. 57(3), pages 535-574, September.
    3. Hansen, Malte, 2020. "Dynamische Preissetzung im Onlinehandel: zu den Auswirkungen auf den Verbraucherpreisindex," WISTA – Wirtschaft und Statistik, Statistisches Bundesamt (Destatis), Wiesbaden, vol. 72(5), pages 91-102.
    4. Daas, Piet J.H. & Puts, Marco J.H., 2014. "Social media sentiment and consumer confidence," Statistics Paper Series 5, European Central Bank.
    5. Blaudow, Christian & Ostermann, Holger, 2020. "Entwicklung eines generischen Programms für die Nutzung von Web Scraping in der Verbraucherpreisstatistik," WISTA – Wirtschaft und Statistik, Statistisches Bundesamt (Destatis), Wiesbaden, vol. 72(5), pages 103-113.
    6. Hansen, Malte, 2020. "Dynamische Preissetzung im Onlinehandel: zur langfristigen Anwendung von automatisierter Preiserhebung," WISTA – Wirtschaft und Statistik, Statistisches Bundesamt (Destatis), Wiesbaden, vol. 72(3), pages 14-23.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Robert Laskowski, 2022. "Differences between Online Prices and the Consumer Prices Index During Covid-19 in Germany," ACTA VSFS, University of Finance and Administration, vol. 16(1), pages 76-87.
    2. Omotosho, Babatunde S., 2020. "Central Bank Communication during Economic Recessions: Evidence from Nigeria," MPRA Paper 99655, University Library of Munich, Germany.
    3. Gregor Jarosch & Jan Sebastian Nimczik & Isaac Sorkin, 2019. "Granular Search, Market Structure, and Wages," NBER Working Papers 26239, National Bureau of Economic Research, Inc.
    4. José Azar & Emiliano Huet-Vaughn & Ioana Marinescu & Bledi Taska & Till von Wachter, 2019. "Minimum Wage Employment Effects and Labor Market Concentration," NBER Working Papers 26101, National Bureau of Economic Research, Inc.
    5. Jeremy Atack & Robert A. Margo & Paul Rhode, 2020. "‘Mechanization Takes Command’: Inanimate Power and Labor Productivity in Late Nineteenth Century American Manufacturing," NBER Working Papers 27436, National Bureau of Economic Research, Inc.
    6. Austan Goolsbee & Chad Syverson, 2023. "Monopsony Power in Higher Education: A Tale of Two Tracks," Journal of Labor Economics, University of Chicago Press, vol. 41(S1), pages 257-290.
    7. Andrew Glover & Jacob Short, 2020. "Demographic Origins of the Decline in Labor's Share," BIS Working Papers 874, Bank for International Settlements.
    8. Morgan Raux, 2019. "Looking for the "Best and Brightest": Hiring difficulties and high-skilled foreign workers," Working Papers halshs-02364921, HAL.
    9. Chris Florakis & Christodoulos Louca & Roni Michaely & Michael Weber, 2020. "Cybersecurity Risk," Working Papers 2020-178, Becker Friedman Institute for Research In Economics.
    10. Ekaterina Prytkova, 2021. "ICT's Wide Web: a System-Level Analysis of ICT's Industrial Diffusion with Algorithmic Links," Jena Economics Research Papers 2021-005, Friedrich-Schiller-University Jena.
    11. Pérez, Jorge & Vial, Felipe & Zárate, Román, 2022. "Urban Transit Infrastructure: Spatial Mismatch and Labor Market Power," Research Department working papers 1992, CAF Development Bank Of Latinamerica.
    12. Eghbal Rahimikia & Stefan Zohren & Ser-Huang Poon, 2021. "Realised Volatility Forecasting: Machine Learning via Financial Word Embedding," Papers 2108.00480, arXiv.org, revised Mar 2023.
    13. Daniel Levy & Tamir Mayer & Alon Raviv, 2020. "Academic Scholarship in Light of the 2008 Financial Crisis: Textual Analysis of NBER Working Papers," Working Papers hal-02488796, HAL.
    14. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    15. Daniel Monte & Roberto Pinheiro, 2021. "Labor market competition over the business cycle," Economic Inquiry, Western Economic Association International, vol. 59(4), pages 1593-1615, October.
    16. Orley Ashenfelter & David Card & Henry Farber & Michael R. Ransom, 2022. "Monopsony in the Labor Market: New Empirical Results and New Public Policies," Journal of Human Resources, University of Wisconsin Press, vol. 57(S), pages 1-10.
    17. Martin Baumgaertner & Johannes Zahner, 2021. "Whatever it takes to understand a central banker - Embedding their words using neural networks," MAGKS Papers on Economics 202130, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    18. Martins, Pedro S. & Melo, António, 2024. "Making their own weather? Estimating employer labour-market power and its wage effects," Journal of Urban Economics, Elsevier, vol. 139(C).
    19. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    20. Martins, Pedro S., 2022. "The wage effects of employers' associations: A case study of the private schools sector," GLO Discussion Paper Series 1163, Global Labor Organization (GLO).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:astaws:v:15:y:2021:i:1:d:10.1007_s11943-021-00280-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.