IDEAS home Printed from https://ideas.repec.org/a/kap/netnom/v19y2018i1d10.1007_s11066-018-9125-2.html
   My bibliography  Save this article

DataGorri: a tool for automated data collection of tabular web content

Author

Listed:
  • Julian Hackinger

    (Technical University of Munich)

Abstract

The era of the internet has been a boon for empirical and evidence-based research. By providing ever increasing amounts of data, the internet offers numerous opportunities for new empirical studies. While some research questions require data that was previously more time-consuming to collect, other data was simply not available before the creation of the internet. However, publicly available information is still often unstructured and its collection can be highly resource-intensive. In this paper we present DataGorri, a software enabling the user-friendly and automated collection of repetitive and non-repetitive tabular data that is freely available on websites. This paper depicts the motivation underlying the software’s creation, describes its usage, and discusses its advantages and limitations.

Suggested Citation

  • Julian Hackinger, 2018. "DataGorri: a tool for automated data collection of tabular web content," Netnomics, Springer, vol. 19(1), pages 31-41, October.
  • Handle: RePEc:kap:netnom:v:19:y:2018:i:1:d:10.1007_s11066-018-9125-2
    DOI: 10.1007/s11066-018-9125-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11066-018-9125-2
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s11066-018-9125-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Christian Zimmermann, 2013. "Academic Rankings with RePEc," Econometrics, MDPI, vol. 1(3), pages 1-32, December.
    2. Jordan, John M. & Meador, Mark & Walters, Stephen J. K., 1989. "Academic research productivity, department size and organization: Further results," Economics of Education Review, Elsevier, vol. 8(4), pages 345-352, August.
    3. Daniel S. Hamermesh, 2013. "Six Decades of Top Economics Publishing: Who and How?," Journal of Economic Literature, American Economic Association, vol. 51(1), pages 162-172, March.
    4. Benjamin Edelman, 2012. "Using Internet Data for Economic Research," Journal of Economic Perspectives, American Economic Association, vol. 26(2), pages 189-206, Spring.
    5. Jordan, John M. & Meador, Mark & Walters, Stephen J. K., 1988. "Effects of department size and organization on the research productivity of academic economists," Economics of Education Review, Elsevier, vol. 7(2), pages 251-255, April.
    6. Golden, John & Carstensen, Fred V., 1992. "Academic research productivity, department size and organization: Further results, comment," Economics of Education Review, Elsevier, vol. 11(2), pages 153-160, June.
    7. Meador, Mark & Walters, Stephen J. K. & Jordan, John M., 1992. "Academic research productivity: Reply, still further results," Economics of Education Review, Elsevier, vol. 11(2), pages 161-167, June.
    8. Golden, John & Carstensen, Fred V., 1992. "Academic research productivity, department size and organization: Further results, rejoinder," Economics of Education Review, Elsevier, vol. 11(2), pages 169-171, June.
    9. João Faria & Rajeev Goel, 2010. "Returns to networking in academia," Netnomics, Springer, vol. 11(2), pages 103-117, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hackinger, Julian, 2019. "Ignoring millions of Euros: Transfer fees and sunk costs in professional football," Journal of Economic Psychology, Elsevier, vol. 75(PB).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Seyed Reza Mirnezami & Catherine Beaudry, 2016. "The effect of holding a research chair on scientists’ productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 399-454, May.
    2. Camil Demetrescu & Andrea Ribichini & Marco Schaerf, 2020. "Are Italian research assessment exercises size-biased?," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 533-549, October.
    3. Kanybek Nur-tegin & Sanjay Venugopalan & Jessica Young, 2020. "Teaching Load and Other Determinants of Research Output Among University Faculty," The American Economist, Sage Publications, vol. 65(2), pages 300-311, October.
    4. Marton Demeter & Agnes Jele & Zsolt Balázs Major, 2022. "The model of maximum productivity for research universities SciVal author ranks, productivity, university rankings, and their implications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4335-4361, August.
    5. Bäker, Agnes, 2015. "Non-tenured post-doctoral researchers’ job mobility and research output: An analysis of the role of research discipline, department size, and coauthors," Research Policy, Elsevier, vol. 44(3), pages 634-650.
    6. John Rigby, 2009. "Comparing the scientific quality achieved by funding instruments for single grant holders and for collaborative networks within a research system: Some observations," Scientometrics, Springer;Akadémiai Kiadó, vol. 78(1), pages 145-164, January.
    7. Abramo, Giovanni & D’Angelo, Ciriaco Andrea, 2015. "Ranking research institutions by the number of highly-cited articles per scientist," Journal of Informetrics, Elsevier, vol. 9(4), pages 915-923.
    8. Maaike Verbree & Edwin Horlings & Peter Groenewegen & Inge Weijden & Peter Besselaar, 2015. "Organizational factors influencing scholarly performance: a multivariate study of biomedical research groups," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 25-49, January.
    9. Püttmann, Vitus & Thomsen, Stephan L. & Trunzer, Johannes, 2020. "Zur Relevanz von Ausstattungsunterschieden für Forschungsleistungsvergleiche: Ein Diskussionsbeitrag für die Wirtschaftswissenschaften in Deutschland," Hannover Economic Papers (HEP) dp-679, Leibniz Universität Hannover, Wirtschaftswissenschaftliche Fakultät, revised Mar 2021.
    10. María Victoria Anauati & Sebastian Galiani & Ramiro H. Gálvez, 2020. "Differences In Citation Patterns Across Journal Tiers: The Case Of Economics," Economic Inquiry, Western Economic Association International, vol. 58(3), pages 1217-1232, July.
    11. Krapf, Matthias & Ursprung, Heinrich W. & Zimmermann, Christian, 2017. "Parenthood and productivity of highly skilled labor: Evidence from the groves of academe," Journal of Economic Behavior & Organization, Elsevier, vol. 140(C), pages 147-175.
    12. William W. Olney, 2017. "English Proficiency And Labor Market Performance: Evidence From The Economics Profession," Economic Inquiry, Western Economic Association International, vol. 55(1), pages 202-222, January.
    13. Giovanni Abramo & Ciriaco Andrea D’Angelo & Francesco Rosati, 2016. "The north–south divide in the Italian higher education system," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 2093-2117, December.
    14. Abramo, Giovanni & D’Angelo, Ciriaco Andrea, 2016. "A comparison of university performance scores and ranks by MNCS and FSS," Journal of Informetrics, Elsevier, vol. 10(4), pages 889-901.
    15. Schuelke-Leech, Beth-Anne, 2013. "Resources and research: An empirical study of the influence of departmental research resources on individual STEM researchers involvement with industry," Research Policy, Elsevier, vol. 42(9), pages 1667-1678.
    16. Sultan Orazbayev, 2017. "Diversity and collaboration in Economics," UCL SSEES Economics and Business working paper series 2017-4, UCL School of Slavonic and East European Studies (SSEES).
    17. Victoria Anauati & Sebastian Galiani & Ramiro H. Gálvez, 2016. "Quantifying The Life Cycle Of Scholarly Articles Across Fields Of Economic Research," Economic Inquiry, Western Economic Association International, vol. 54(2), pages 1339-1355, April.
    18. Simona Malovana & Martin Hodula & Zuzana Rakovska, 2020. "Researching the Research: A Central Banking Edition," Research and Policy Notes 2020/03, Czech National Bank.
    19. Yushan Hu & Ben G. Li, 2021. "The production economics of economics production," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 30(1), pages 228-255, February.
    20. Lutz Bornmann & Alexander Butz & Klaus Wohlrabe, 2018. "What are the top five journals in economics? A new meta-ranking," Applied Economics, Taylor & Francis Journals, vol. 50(6), pages 659-675, February.

    More about this item

    Keywords

    Software; DataGorri; Web scraper; Data scraper; Crawler; Data collection;
    All these keywords.

    Lists

    This item is featured on the following reading lists, Wikipedia, or ReplicationWiki pages:
    1. Papers using RePEc data

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:kap:netnom:v:19:y:2018:i:1:d:10.1007_s11066-018-9125-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.