IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/100469.html
   My bibliography  Save this paper

Machine learning classification of entrepreneurs in British historical census data

Author

Listed:
  • Montebruno, Piero
  • Bennett, Robert
  • Smith, Harry
  • van Lieshout, Carry

Abstract

This paper presents a binary classification of entrepreneurs in British historical data based on the recent availability of big data from the I-CeM dataset. The main task of the paper is to attribute an employment status to individuals that did not fully report entrepreneur status in earlier censuses (1851-1881). The paper assesses the accuracy of different classifiers and machine learning algorithms, including Deep Learning, for this classification problem. We first adopt a ground-truth dataset from the later censuses to train the computer with a Logistic Regression (which is standard in the literature for this kind of binary classification) to recognize entrepreneurs distinct from non-entrepreneurs (i.e. workers). Our initial accuracy for this base-line method is 0.74. We compare the Logistic Regression with ten optimized machine learning algorithms: Nearest Neighbors, Linear and Radial Support Vector Machine, Gaussian Process, Decision Tree, Random Forest, Neural Network, AdaBoost, Naive Bayes, and Quadratic Discriminant Analysis. The best results are boosting and ensemble methods. AdaBoost achieves an accuracy of 0.95. Deep-Learning, as a standalone category of algorithms, further improves accuracy to 0.96 without using the rich text-data that characterizes the OccString feature, a string of up to 500 characters with the full occupational statement of each individual collected in the earlier censuses. Finally, and now using this OccString feature, we implement both shallow (bag-of-words algorithm) learning and Deep Learning (Recurrent Neural Network with a Long Short-Term Memory layer) algorithms. These methods all achieve accuracies above 0.99 with Deep Learning Recurrent Neural Network as the best model with an accuracy of 0.9978. The results show that standard algorithms for classification can be outperformed by machine learning algorithms. This confirms the value of extending the techniques traditionally used in the literature for this type of classification problem.

Suggested Citation

  • Montebruno, Piero & Bennett, Robert & Smith, Harry & van Lieshout, Carry, 2019. "Machine learning classification of entrepreneurs in British historical census data," MPRA Paper 100469, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:100469
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/100469/1/MPRA_paper_100469.pdf
    File Function: original version
    Download Restriction: no

    File URL: https://mpra.ub.uni-muenchen.de/106931/49/MPRA_paper_106931.pdf
    File Function: revised version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2019. "Entrepreneurial discrete choice: Modelling decisions between self-employment, employer and worker status. Working paper 15," MPRA Paper 103192, University Library of Munich, Germany.
    2. Blanchflower, David G & Oswald, Andrew J, 1998. "What Makes an Entrepreneur?," Journal of Labor Economics, University of Chicago Press, vol. 16(1), pages 26-60, January.
    3. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2018. "Reconstructing entrepreneur and business numbers for censuses 1851-81. Working paper 9," MPRA Paper 103529, University Library of Munich, Germany.
    4. Sophia Rabe-Hesketh & Anders Skrondal, 2012. "Multilevel and Longitudinal Modeling Using Stata, 3rd Edition," Stata Press books, StataCorp LP, edition 3, number mimus2, March.
    5. Parker,Simon C., 2006. "The Economics of Self-Employment and Entrepreneurship," Cambridge Books, Cambridge University Press, number 9780521030632.
    6. Kevin Sch�rer & Tatiana Penkova & Yanshan Shi, 2015. "Standardising and Coding Birthplace Strings and Occupational Titles in the British Censuses of 1851 to 1911," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 48(4), pages 195-213, October.
    7. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2019. "Reconstructing business proprietor responses for censuses 1851-81: a tailored logit cut-off method. Working paper 9.2," MPRA Paper 103206, University Library of Munich, Germany.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ivana Lolić & Petar Sorić & Marija Logarušić, 2022. "Economic Policy Uncertainty Index Meets Ensemble Learning," Computational Economics, Springer;Society for Computational Economics, vol. 60(2), pages 401-437, August.
    2. Graham, Byron & Bonner, Karen, 2022. "One size fits all? Using machine learning to study heterogeneity and dominance in the determinants of early-stage entrepreneurship," Journal of Business Research, Elsevier, vol. 152(C), pages 42-59.
    3. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2019. "Reconstructing business proprietor responses for censuses 1851-81: a tailored logit cut-off method. Working paper 9.2," MPRA Paper 103206, University Library of Munich, Germany.
    4. Robert J. Bennett & Harry Smith & Piero Montebruno & Carry van Lieshout, 2022. "Changes in Victorian entrepreneurship in England and Wales 1851-1911: Methodology and business population estimates," Business History, Taylor & Francis Journals, vol. 64(7), pages 1211-1243, September.
    5. Mehmet Güney Celbiş, 2021. "A machine learning approach to rural entrepreneurship," Papers in Regional Science, Wiley Blackwell, vol. 100(4), pages 1079-1104, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Isabel Grilo & Roy Thurik, 2008. "Determinants of entrepreneurial engagement levels in Europe and the US," Industrial and Corporate Change, Oxford University Press and the Associazione ICC, vol. 17(6), pages 1113-1145, December.
    2. Luis Medrano-Adán & Vicente Salas-Fumás & J. Sanchez-Asin, 2015. "Heterogeneous entrepreneurs from occupational choices in economies with minimum wages," Small Business Economics, Springer, vol. 44(3), pages 597-619, March.
    3. Francesco Quatraro & Marco Vivarelli, 2015. "Drivers of Entrepreneurship and Post-entry Performance of Newborn Firms in Developing Countries," The World Bank Research Observer, World Bank, vol. 30(2), pages 277-305.
    4. Nathalie Colombier & David Masclet, 2008. "Intergenerational correlation in self employment: some further evidence from French ECHP data," Small Business Economics, Springer, vol. 30(4), pages 423-437, April.
    5. Sander Wennekers & Roy Thurik & André Stel & Niels Noorderhaven, 2010. "Uncertainty Avoidance and the Rate of Business Ownership Across 21 OECD Countries, 1976–2004," Springer Books, in: Andreas Freytag & Roy Thurik (ed.), Entrepreneurship and Culture, chapter 0, pages 271-299, Springer.
    6. Stijn Baert & Bas van der Klaauw & Gijsbert van Lomwel, 2018. "The effectiveness of medical and vocational interventions for reducing sick leave of self‐employed workers," Health Economics, John Wiley & Sons, Ltd., vol. 27(2), pages 139-152, February.
    7. Marcén, Miriam, 2013. "The effect of culture on self-employment," MPRA Paper 47338, University Library of Munich, Germany.
    8. Magnus Henrekson & Jesper Roine, 2007. "Promoting Entrepreneurship in the Welfare State," Chapters, in: David B. Audretsch & Isabel Grilo & A. Roy Thurik (ed.), Handbook of Research on Entrepreneurship Policy, chapter 5, Edward Elgar Publishing.
    9. Theodore Lianos & Anastasia Pseiridis, 2009. "On the occupational choices of return migrants," Entrepreneurship & Regional Development, Taylor & Francis Journals, vol. 21(2), pages 155-181, March.
    10. P. Mueller, 2006. "Entrepreneurship in the Region: Breeding Ground for Nascent Entrepreneurs?," Small Business Economics, Springer, vol. 27(1), pages 41-58, August.
    11. Marcén, Miriam, 2014. "The role of culture on self-employment," Economic Modelling, Elsevier, vol. 44(S1), pages 20-32.
    12. Démurger, Sylvie & Xu, Hui, 2011. "Return Migrants: The Rise of New Entrepreneurs in Rural China," World Development, Elsevier, vol. 39(10), pages 1847-1861.
    13. Milo Bianchi & Magnus Henrekson, 2005. "Is Neoclassical Economics still Entrepreneurless?," Kyklos, Wiley Blackwell, vol. 58(3), pages 353-377, July.
    14. Isabel Grilo & Roy Thurik, 2005. "Entrepreneurial engagement levels in the European Union," Papers on Entrepreneurship, Growth and Public Policy 2005-29, Max Planck Institute of Economics, Entrepreneurship, Growth and Public Policy Group.
    15. Viinikainen, Jutta & Heineck, Guido & Böckerman, Petri & Hintsanen, Mirka & Raitakari, Olli & Pehkonen, Jaakko, 2017. "Born entrepreneurs? Adolescents’ personality characteristics and entrepreneurship in adulthood," Journal of Business Venturing Insights, Elsevier, vol. 8(C), pages 9-12.
    16. Catia Batista & Janis Umblijs, 2014. "Migration, risk attitudes, and entrepreneurship: evidence from a representative immigrant survey," IZA Journal of Migration and Development, Springer;Forschungsinstitut zur Zukunft der Arbeit GmbH (IZA), vol. 3(1), pages 1-25, December.
    17. Catherine Laffineur & Saulo Dubard Barbosa & Alain Fayolle & Emeran Nziali, 2017. "Active labor market programs’ effects on entrepreneurship and unemployment," Small Business Economics, Springer, vol. 49(4), pages 889-918, December.
    18. Altin Vejsiu, 2011. "Incentives to self-employment decision in Sweden," International Review of Applied Economics, Taylor & Francis Journals, vol. 25(4), pages 379-403.
    19. Junfu Zhang & Zhong Zhao, 2015. "Social-family network and self-employment: evidence from temporary rural–urban migrants in China," IZA Journal of Labor & Development, Springer;Forschungsinstitut zur Zukunft der Arbeit GmbH (IZA), vol. 4(1), pages 1-21, December.
    20. Erik Stam & David Audretsch & Joris Meijaard, 2009. "Renascent entrepreneurship," Springer Books, in: Uwe Cantner & Jean-Luc Gaffard & Lionel Nesta (ed.), Schumpeterian Perspectives on Innovation, Competition and Growth, pages 223-237, Springer.
      • Stam, F.C. & Audretsch, D.B. & Meijaard, J., 2006. "Renascent Entrepreneurship," ERIM Report Series Research in Management ERS-2006-017-ORG, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.

    More about this item

    Keywords

    machine learning; deep learning; logistic regression; classification; big data; census;
    All these keywords.

    JEL classification:

    • M13 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - New Firms; Startups
    • N83 - Economic History - - Micro-Business History - - - Europe: Pre-1913

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:100469. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.