IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/h572n.html
   My bibliography  Save this paper

Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue

Author

Listed:
  • Occhini, Giulia
  • Tranos, Emmanouil
  • Wolf, Levi John

    (University of Bristol)

Abstract

In this paper we propose the use of commercial websites and a contextualized weak supervision framework as an alternative to industrial taxonomies to identify and classify digital industrial activity. Despite the crucial importance of industrial taxonomies for government and research, their static nature leaves taxonomies unable to accurately capture a country’s industrial structure. This is particularly problematic in the context for firms producing novel, digital outputs, which are nowadays classified into the wrong industrial sectors and thus rendered almost invisible to official statistics. To address this issue we show how commercial websites can complement, or even substitute industrial classification surveys and ultimately yield a more complete, up-to-date understanding of a country’s industrial structure evolution. In the process, we compare our classification results using only commercial websites’ landing page versus using full website for classification, finding that a company’s landing page is a better predictor of industrial classes than their full website. We also suggest that our framework could support longitudinal analyses by proposing a pipeline using archival websites. This method can be used by policymakers to identify classes of industries from a bottom-up perspective, while at the same time advocating for the usage of state-of-the art NLP techniques in economics and business research.

Suggested Citation

  • Occhini, Giulia & Tranos, Emmanouil & Wolf, Levi John, 2023. "Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue," SocArXiv h572n, Center for Open Science.
  • Handle: RePEc:osf:socarx:h572n
    DOI: 10.31219/osf.io/h572n
    as

    Download full text from publisher

    File URL: https://osf.io/download/6405effec74723023d10b56b/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/h572n?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ron A. Boschma & Koen Frenken, 2006. "Why is economic geography not an evolutionary science? Towards an evolutionary economic geography," Journal of Economic Geography, Oxford University Press, vol. 6(3), pages 273-302, June.
    2. Frank Neffke & Martin Henning & Ron Boschma, 2011. "How Do Regions Diversify over Time? Industry Relatedness and the Development of New Growth Paths in Regions," Economic Geography, Clark University, vol. 87(3), pages 237-265, July.
    3. Rizov, Marian & Vecchi, Michela & Domenech, Josep, 2022. "Going online: Forecasting the impact of websites on productivity and market structure," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    4. Nathan, Max & Rosso, Anna, 2015. "Mapping digital businesses with big data: Some early findings from the UK," Research Policy, Elsevier, vol. 44(9), pages 1714-1733.
    5. Alex Bishop & Juan Mateos-Garcia & George Richardson, 2022. "Using Text Data to Improve Industrial Statistics in the UK," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2022-01, Economic Statistics Centre of Excellence (ESCoE).
    6. Shaobo Li & Jie Hu & Yuxin Cui & Jianjun Hu, 2018. "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 721-744, November.
    7. Jan Kinne & Janna Axenbeck, 2020. "Web mining for innovation ecosystem mapping: a framework and a large-scale pilot study," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2011-2041, December.
    8. Daniel Arribas-Bel & Jessie Bakens, 2019. "Use and validation of location-based services in urban research: An example with Dutch restaurants," Urban Studies, Urban Studies Journal Limited, vol. 56(5), pages 868-884, April.
    9. Koen Frenken & Frank Van Oort & Thijs Verburg, 2007. "Related Variety, Unrelated Variety and Regional Economic Growth," Regional Studies, Taylor & Francis Journals, vol. 41(5), pages 685-697.
    10. Dalziel, Margaret, 2007. "A systems-based approach to industry classification," Research Policy, Elsevier, vol. 36(10), pages 1559-1574, December.
    11. Sanjeev Bhojraj & Charles M. C. Lee & Derek K. Oler, 2003. "What's My Line? A Comparison of Industry Classification Schemes for Capital Market Research," Journal of Accounting Research, Wiley Blackwell, vol. 41(5), pages 745-774, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jürgen Essletzbichler, 2013. "Relatedness, industrial branching and technological cohesion in U.S. metropolitan areas," Papers in Evolutionary Economic Geography (PEEG) 1307, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised May 2013.
    2. Martin, Hanna & Martin, Roman & Zukauskaite, Elena, 2018. "The Multiple Roles of Demand in Regional Development A Conceptual Analysis," Papers in Innovation Studies 2018/10, Lund University, CIRCLE - Centre for Innovation Research.
    3. Ron Boschma & Koen Frenken, 2011. "The emerging empirics of evolutionary economic geography," Journal of Economic Geography, Oxford University Press, vol. 11(2), pages 295-307, March.
    4. Carlo Corradini, 2019. "Location determinants of green technological entry: evidence from European regions," Small Business Economics, Springer, vol. 52(4), pages 845-858, April.
    5. Stefano Breschi & Camilla Lenzi, 2015. "The Role of External Linkages and Gatekeepers for the Renewal and Expansion of US Cities' Knowledge Base, 1990-2004," Regional Studies, Taylor & Francis Journals, vol. 49(5), pages 782-797, May.
    6. Tom Broekel & Rune Dahl Fitjar & Silje Haus-Reve, 2021. "The roles of diversity, complexity, and relatedness in regional development – What does the occupational perspective add?," Papers in Evolutionary Economic Geography (PEEG) 2135, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Nov 2021.
    7. Martin Henning & Erik Stam & Rik Wenting, 2013. "Path Dependence Research in Regional Economic Development: Cacophony or Knowledge Accumulation?," Regional Studies, Taylor & Francis Journals, vol. 47(8), pages 1348-1362, September.
    8. Silvia Rita Sedita & Ivan De Noni & Luciano Pilotti, 2014. "How do related variety and differentiated knowledge bases influence the resilience of local production systems?," "Marco Fanno" Working Papers 0180, Dipartimento di Scienze Economiche "Marco Fanno".
    9. José M. Gaspar, 2018. "A prospective review on New Economic Geography," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 61(2), pages 237-272, September.
    10. Shengjun Zhu & Chong Wang & Canfei He, 2019. "High-speed Rail Network and Changing Industrial Dynamics in Chinese Regions," International Regional Science Review, , vol. 42(5-6), pages 495-518, September.
    11. Hanna Martin & Roman Martin & Elena Zukauskaite, 2019. "The multiple roles of demand in new regional industrial path development: A conceptual analysis," Environment and Planning A, , vol. 51(8), pages 1741-1757, November.
    12. Michael Fritsch & Sandra Kublina, 2019. "Persistence and change of regional new business formation in the national league table," Journal of Evolutionary Economics, Springer, vol. 29(3), pages 891-917, July.
    13. Christoph Stich & Emmanouil Tranos & Max Nathan, 2023. "Modeling clusters from the ground up: A web data approach," Environment and Planning B, , vol. 50(1), pages 244-267, January.
    14. Carolina Castaldi & Koen Frenken & Bart Los, 2015. "Related Variety, Unrelated Variety and Technological Breakthroughs: An analysis of US State-Level Patenting," Regional Studies, Taylor & Francis Journals, vol. 49(5), pages 767-781, May.
    15. Van den Berghe, Karel & Dąbrowski, Marcin & Ersoy, Aksel & Wandl, Alexander & van Bueren, Ellen, 2019. "The Circular Economy: a Re-Emerging Industry? [working paper]," SocArXiv tgvzj, Center for Open Science.
    16. Elekes, Zoltán, 2016. "A regionális növekedés új tényezői az evolúciós gazdaságföldrajzi kutatásokban. A változatosság és a technológiai közelség [The new factors of regional growth in research into evolutionary economic," Közgazdasági Szemle (Economic Review - monthly of the Hungarian Academy of Sciences), Közgazdasági Szemle Alapítvány (Economic Review Foundation), vol. 0(3), pages 307-329.
    17. Canfei He & Qi Guo & David Rigby, 2015. "Industry Relatedness, Agglomeration Externalities and Firm Survival in China," Papers in Evolutionary Economic Geography (PEEG) 1528, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Sep 2015.
    18. Ron Boschma, 2017. "A concise history of the knowledge base literature: challenging questions for future research," Papers in Evolutionary Economic Geography (PEEG) 1721, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Sep 2017.
    19. Alessia Lo Turco & Daniela Maggioni, 2016. "On firms’ product space evolution: the role of firm and local product relatedness," Journal of Economic Geography, Oxford University Press, vol. 16(5), pages 975-1006.
    20. Ron Boschma, Lars Coenen, Koen Frenken, Bernhard Truffer & Lars Coenen & Koen Frenken & Bernhard Truffer, 2016. "Towards a theory of regional diversification," Papers in Evolutionary Economic Geography (PEEG) 1617, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jul 2016.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:h572n. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.