IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/100469.html

Machine learning classification of entrepreneurs in British historical census data

Author

Listed:
  • Montebruno, Piero
  • Bennett, Robert
  • Smith, Harry
  • van Lieshout, Carry

Abstract

This paper presents a binary classification of entrepreneurs in British historical data based on the recent availability of big data from the I-CeM dataset. The main task of the paper is to attribute an employment status to individuals that did not fully report entrepreneur status in earlier censuses (1851-1881). The paper assesses the accuracy of different classifiers and machine learning algorithms, including Deep Learning, for this classification problem. We first adopt a ground-truth dataset from the later censuses to train the computer with a Logistic Regression (which is standard in the literature for this kind of binary classification) to recognize entrepreneurs distinct from non-entrepreneurs (i.e. workers). Our initial accuracy for this base-line method is 0.74. We compare the Logistic Regression with ten optimized machine learning algorithms: Nearest Neighbors, Linear and Radial Support Vector Machine, Gaussian Process, Decision Tree, Random Forest, Neural Network, AdaBoost, Naive Bayes, and Quadratic Discriminant Analysis. The best results are boosting and ensemble methods. AdaBoost achieves an accuracy of 0.95. Deep-Learning, as a standalone category of algorithms, further improves accuracy to 0.96 without using the rich text-data that characterizes the OccString feature, a string of up to 500 characters with the full occupational statement of each individual collected in the earlier censuses. Finally, and now using this OccString feature, we implement both shallow (bag-of-words algorithm) learning and Deep Learning (Recurrent Neural Network with a Long Short-Term Memory layer) algorithms. These methods all achieve accuracies above 0.99 with Deep Learning Recurrent Neural Network as the best model with an accuracy of 0.9978. The results show that standard algorithms for classification can be outperformed by machine learning algorithms. This confirms the value of extending the techniques traditionally used in the literature for this type of classification problem.

Suggested Citation

  • Montebruno, Piero & Bennett, Robert & Smith, Harry & van Lieshout, Carry, 2019. "Machine learning classification of entrepreneurs in British historical census data," MPRA Paper 100469, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:100469
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/100469/1/MPRA_paper_100469.pdf
    File Function: original version
    Download Restriction: no

    File URL: https://mpra.ub.uni-muenchen.de/106931/49/MPRA_paper_106931.pdf
    File Function: revised version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2019. "Entrepreneurial discrete choice: Modelling decisions between self-employment, employer and worker status. Working paper 15," MPRA Paper 103192, University Library of Munich, Germany.
    2. Kevin Sch�rer & Tatiana Penkova & Yanshan Shi, 2015. "Standardising and Coding Birthplace Strings and Occupational Titles in the British Censuses of 1851 to 1911," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 48(4), pages 195-213, October.
    3. Blanchflower, David G & Oswald, Andrew J, 1998. "What Makes an Entrepreneur?," Journal of Labor Economics, University of Chicago Press, vol. 16(1), pages 26-60, January.
    4. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2018. "Reconstructing entrepreneur and business numbers for censuses 1851-81. Working paper 9," MPRA Paper 103529, University Library of Munich, Germany.
    5. Montebruno, Piero & Bennett, Robert J. & van Lieshout, Carry & Smith, Harry, 2019. "A tale of two tails: Do Power Law and Lognormal models fit firm-size distributions in the mid-Victorian era?," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 858-875.
    6. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2019. "Reconstructing business proprietor responses for censuses 1851-81: a tailored logit cut-off method. Working paper 9.2," MPRA Paper 103206, University Library of Munich, Germany.
    7. Sophia Rabe-Hesketh & Anders Skrondal, 2012. "Multilevel and Longitudinal Modeling Using Stata, 3rd Edition," Stata Press books, StataCorp LLC, edition 3, number mimus2, March.
    8. Parker,Simon C., 2006. "The Economics of Self-Employment and Entrepreneurship," Cambridge Books, Cambridge University Press, number 9780521030632, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ivana Lolić & Petar Sorić & Marija Logarušić, 2022. "Economic Policy Uncertainty Index Meets Ensemble Learning," Computational Economics, Springer;Society for Computational Economics, vol. 60(2), pages 401-437, August.
    2. Robert J. Bennett & Harry Smith & Piero Montebruno & Carry van Lieshout, 2022. "Changes in Victorian entrepreneurship in England and Wales 1851-1911: Methodology and business population estimates," Business History, Taylor & Francis Journals, vol. 64(7), pages 1211-1243, September.
    3. Mehmet Güney Celbiş, 2021. "A machine learning approach to rural entrepreneurship," Papers in Regional Science, Wiley Blackwell, vol. 100(4), pages 1079-1104, August.
    4. Graham, Byron & Bonner, Karen, 2022. "One size fits all? Using machine learning to study heterogeneity and dominance in the determinants of early-stage entrepreneurship," Journal of Business Research, Elsevier, vol. 152(C), pages 42-59.
    5. Bennett, Robert & Montebruno, Piero & Smith, Harry & van Lieshout, Carry, 2019. "Reconstructing business proprietor responses for censuses 1851-81: a tailored logit cut-off method. Working paper 9.2," MPRA Paper 103206, University Library of Munich, Germany.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Milo Bianchi, 2012. "Financial Development, Entrepreneurship, and Job Satisfaction," The Review of Economics and Statistics, MIT Press, vol. 94(1), pages 273-286, February.
    2. Werner, Arndt, 2008. "Do Credit Constraints Matter more for College Dropout Entrepreneurs?," MPRA Paper 11867, University Library of Munich, Germany.
    3. Luis Medrano-Adán & Vicente Salas-Fumás & J. Sanchez-Asin, 2015. "Heterogeneous entrepreneurs from occupational choices in economies with minimum wages," Small Business Economics, Springer, vol. 44(3), pages 597-619, March.
    4. Nathalie Colombier & David Masclet, 2008. "Intergenerational correlation in self employment: some further evidence from French ECHP data," Small Business Economics, Springer, vol. 30(4), pages 423-437, April.
    5. Sander Wennekers & Roy Thurik & André Stel & Niels Noorderhaven, 2010. "Uncertainty Avoidance and the Rate of Business Ownership Across 21 OECD Countries, 1976–2004," Springer Books, in: Andreas Freytag & Roy Thurik (ed.), Entrepreneurship and Culture, chapter 0, pages 271-299, Springer.
    6. P. Köllinger & M. Minniti, 2006. "Not for Lack of Trying: American Entrepreneurship in Black and White," Small Business Economics, Springer, vol. 27(1), pages 59-79, August.
    7. Backman, Mikaela & Karlsson, Charlie, 2013. "Who says life is over after 55? Entrepreneurship and an aging population," Working Paper Series in Economics and Institutions of Innovation 325, Royal Institute of Technology, CESIS - Centre of Excellence for Science and Innovation Studies.
    8. Kunwon Ahn & John V. Winters, 2023. "Does education enhance entrepreneurship?," Small Business Economics, Springer, vol. 61(2), pages 717-743, August.
    9. Werner, Arndt, 2011. "Abbruch und Aufschub von Gründungsvorhaben: Eine empirische Analyse mit den Daten des Gründerpanels des IfM Bonn," IfM-Materialien 209, Institut für Mittelstandsforschung (IfM) Bonn.
    10. Marcén, Miriam, 2013. "The effect of culture on self-employment," MPRA Paper 47338, University Library of Munich, Germany.
    11. Magnus Henrekson & Jesper Roine, 2007. "Promoting Entrepreneurship in the Welfare State," Chapters, in: David B. Audretsch & Isabel Grilo & A. Roy Thurik (ed.), Handbook of Research on Entrepreneurship Policy, chapter 5, Edward Elgar Publishing.
    12. Rotger, Gabriel Pons & Gørtz, Mette & Storey, David J., 2012. "Assessing the effectiveness of guided preparation for new venture creation and performance: Theory and practice," Journal of Business Venturing, Elsevier, vol. 27(4), pages 506-521.
    13. Márton Gosztonyi & Csákné Filep Judit, 2022. "Profiling (Non-)Nascent Entrepreneurs in Hungary Based on Machine Learning Approaches," Sustainability, MDPI, vol. 14(6), pages 1-20, March.
    14. Theodore Lianos & Anastasia Pseiridis, 2009. "On the occupational choices of return migrants," Entrepreneurship & Regional Development, Taylor & Francis Journals, vol. 21(2), pages 155-181, March.
    15. Aloña Martiarena, 2013. "What’s so entrepreneurial about intrapreneurs?," Small Business Economics, Springer, vol. 40(1), pages 27-39, January.
    16. Juan Pérez Velasco Pavón, 2014. "Economic behavior of indigenous peoples: the Mexican case," Latin American Economic Review, Springer;Centro de Investigaciòn y Docencia Económica (CIDE), vol. 23(1), pages 1-58, December.
    17. P. Mueller, 2006. "Entrepreneurship in the Region: Breeding Ground for Nascent Entrepreneurs?," Small Business Economics, Springer, vol. 27(1), pages 41-58, August.
    18. Marcén, Miriam, 2014. "The role of culture on self-employment," Economic Modelling, Elsevier, vol. 44(S1), pages 20-32.
    19. Milo Bianchi & Magnus Henrekson, 2005. "Is Neoclassical Economics still Entrepreneurless?," Kyklos, Wiley Blackwell, vol. 58(3), pages 353-377, July.
    20. Robert W. Fairlie & Alicia Robb, 2007. "Families, Human Capital, and Small Business: Evidence from the Characteristics of Business Owners Survey," ILR Review, Cornell University, ILR School, vol. 60(2), pages 225-245, January.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • M13 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - New Firms; Startups
    • N83 - Economic History - - Micro-Business History - - - Europe: Pre-1913

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:100469. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.