IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/bka76.html

Identifying and Improving Functional Form Complexity: A Machine Learning Framework

Author

Listed:
  • Verhagen, Mark D.

Abstract

`All models are wrong, but some are useful' is an often-used mantra, particularly when a model's ability to capture the full complexities of social life is questioned. However, an appropriate functional form is key to valid statistical inference, and under-estimating complexity can lead to biased results. Unfortunately, it is unclear a-priori what the appropriate complexity of a functional form should be. I propose to use methods from machine learning to identify the appropriate complexity of the functional form by i) generating an estimate of the fit potential of the outcome given a set of explanatory variables, ii) comparing this potential with the fit from the functional form originally hypothesized by the researcher, and iii) in case a lack of fit is identified, using recent advances in the field of explainable AI to generate understanding into the missing complexity. I illustrate the approach with a range of simulation and real-world examples.

Suggested Citation

  • Verhagen, Mark D., 2021. "Identifying and Improving Functional Form Complexity: A Machine Learning Framework," SocArXiv bka76, Center for Open Science.
  • Handle: RePEc:osf:socarx:bka76
    DOI: 10.31219/osf.io/bka76
    as

    Download full text from publisher

    File URL: https://osf.io/download/61a5f1655d085f093b0d9c88/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/bka76?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. J. Scott Long & Pravin K. Trivedi, 1992. "Some Specification Tests for the Linear Regression Model," Sociological Methods & Research, , vol. 21(2), pages 161-204, November.
    2. Jurgen A. Doornik & David F. Hendry & Steve Cook, 2015. "Statistical model selection with “Big Data”," Cogent Economics & Finance, Taylor & Francis Journals, vol. 3(1), pages 1045216-104, December.
    3. Gang-Zhi Fan & Seow Eng Ong & Hian Chye Koh, 2006. "Determinants of House Price: A Decision Tree Approach," Urban Studies, Urban Studies Journal Limited, vol. 43(12), pages 2301-2315, November.
    4. Matthew J. Salganik & Ian Lundberg & Alexander T. Kindel & Caitlin E. Ahearn & Khaled Al-Ghoneim & Abdullah Almaatouq & Drew M. Altschul & Jennie E. Brand & Nicole Bohme Carnegie & Ryan James Compton , 2020. "Measuring the predictability of life outcomes with a scientific mass collaboration," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(14), pages 8398-8403, April.
    5. Can, Ayse, 1992. "Specification and estimation of hedonic housing price models," Regional Science and Urban Economics, Elsevier, vol. 22(3), pages 453-474, September.
    6. Mark R. Joslyn & Donald P. Haider-Markel & Michael Baggs & Andrew Bilbo, 2017. "Emerging Political Identities? Gun Ownership and Voting in Presidential Elections-super-," Social Science Quarterly, Southwestern Social Science Association, vol. 98(2), pages 382-396, June.
    7. James J. Heckman & John Eric Humphries & Gregory Veramendi, 2018. "Returns to Education: The Causal Effects of Education on Earnings, Health, and Smoking," Journal of Political Economy, University of Chicago Press, vol. 126(S1), pages 197-246.
    8. Valerio Baćak & Edward H. Kennedy, 2019. "Principled Machine Learning Using the Super Learner: An Application to Predicting Prison Violence," Sociological Methods & Research, , vol. 48(3), pages 698-721, August.
    9. Xavier X. Sala-i-Martin, 1997. "I Just Ran Four Million Regressions," NBER Working Papers 6252, National Bureau of Economic Research, Inc.
    10. Vaart Aad W. van der & Dudoit Sandrine & Laan Mark J. van der, 2006. "Oracle inequalities for multi-fold cross validation," Statistics & Risk Modeling, De Gruyter, vol. 24(3), pages 351-371, December.
    11. Sebastian Lapuschkin & Stephan Wäldchen & Alexander Binder & Grégoire Montavon & Wojciech Samek & Klaus-Robert Müller, 2019. "Unmasking Clever Hans predictors and assessing what machines really learn," Nature Communications, Nature, vol. 10(1), pages 1-8, December.
    12. J. Elhorst, 2010. "Applied Spatial Econometrics: Raising the Bar," Spatial Economic Analysis, Taylor & Francis Journals, vol. 5(1), pages 9-28.
    13. Yatchew,Adonis, 2003. "Semiparametric Regression for the Applied Econometrician," Cambridge Books, Cambridge University Press, number 9780521012263, January.
    14. Yatchew, A., 1997. "An elementary estimator of the partial linear model," Economics Letters, Elsevier, vol. 57(2), pages 135-143, December.
    15. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    16. Yun Liao, 2017. "Machine Learning in Macro-Economic Series Forecasting," International Journal of Economics and Finance, Canadian Center of Science and Education, vol. 9(12), pages 71-76, December.
    17. Richard M. Golden & Steven S. Henley & Halbert White & T. Michael Kashner, 2016. "Generalized Information Matrix Tests for Detecting Model Misspecification," Econometrics, MDPI, vol. 4(4), pages 1-24, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Verhagen, Mark D., 2023. "Using machine learning to monitor the equity of large-scale policy interventions: The Dutch decentralisation of the Social Domain," SocArXiv qzm7y, Center for Open Science.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Akdeniz Duran, Esra & Härdle, Wolfgang Karl & Osipenko, Maria, 2012. "Difference based ridge and Liu type estimators in semiparametric regression models," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 164-175.
    2. Michael Lokshin, 2006. "Difference-based semiparametric estimation of partial linear regression models," Stata Journal, StataCorp LLC, vol. 6(3), pages 377-383, September.
    3. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2021. "Modelling non-stationary ‘Big Data’," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1556-1575.
    4. repec:osf:socarx:y6mnb_v1 is not listed on IDEAS
    5. Shen, Chung-Wei & Tsou, Tsung-Shan & Balakrishnan, N., 2011. "Robust likelihood inference for regression parameters in partially linear models," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1696-1714, April.
    6. Panos Pashardes & Alexandros Polycarpou, 2015. "A backward-bending and forward-falling semi-log model of labour supply," University of Cyprus Working Papers in Economics 03-2015, University of Cyprus Department of Economics.
    7. Hongchang Hu & Yu Zhang & Xiong Pan, 2016. "Asymptotic normality of DHD estimators in a partially linear model," Statistical Papers, Springer, vol. 57(3), pages 567-587, September.
    8. Dong, Chaohua & Gao, Jiti & Linton, Oliver, 2023. "High dimensional semiparametric moment restriction models," Journal of Econometrics, Elsevier, vol. 232(2), pages 320-345.
    9. Raushan Kumar, 2017. "Price Discovery in Some Primary Commodity Markets in India," Economics Bulletin, AccessEcon, vol. 37(3), pages 1817-1829.
    10. Dawid Siwicki, 2021. "The Application of Machine Learning Algorithms for Spatial Analysis: Predicting of Real Estate Prices in Warsaw," Working Papers 2021-05, Faculty of Economic Sciences, University of Warsaw.
    11. Brunes, Fredrik & Hermansson, Cecilia & Song, Han-Suck & Wilhelmsson, Mats, 2016. "NIMBYs for the rich and YIMBYs for the poor: Analyzing the property price effects of infill development," Working Paper Series 16/2, Royal Institute of Technology, Department of Real Estate and Construction Management & Banking and Finance.
    12. Faroek Lazrak & Peter Nijkamp & Piet Rietveld & Jan Rouwendal, 2014. "The market value of cultural heritage in urban areas: an application of spatial hedonic pricing," Journal of Geographical Systems, Springer, vol. 16(1), pages 89-114, January.
    13. Chien-Chia L. Huang & Yow-Jen Jou & Hsun-Jung Cho, 2017. "Difference-based matrix perturbation method for semi-parametric regression with multicollinearity," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(12), pages 2161-2171, September.
    14. Behr,Daniela Monika & Chen,Lixue & Goel,Ankita & Haider,Khondoker Tanveer & Sandeep Singh & Zaman,Asad, 2023. "Estimating House Prices in Emerging Markets and Developing Economies : A Big Data Approach," Policy Research Working Paper Series 10301, The World Bank.
    15. Kun Duan & Tapas Mishra & Mamata Parhi & Simon Wolfe, 2019. "How Effective are Policy Interventions in a Spatially-Embedded International Real Estate Market?," The Journal of Real Estate Finance and Economics, Springer, vol. 58(4), pages 596-637, May.
    16. Cordera, Ruben & Chiarazzo, Vincenza & Ottomanelli, Michele & dell’Olio, Luigi & Ibeas, Angel, 2019. "The impact of undesirable externalities on residential property values: spatial regressive models and an empirical study," Transport Policy, Elsevier, vol. 80(C), pages 177-187.
    17. Simon K.C. Cheung, 2017. "A Localized Model for Residential Property Valuation: Nearest Neighbor with Attribute Differences," International Real Estate Review, Global Social Science Institute, vol. 20(2), pages 221-250.
    18. Fikri Akdeniz & Mahdi Roozbeh, 2019. "Generalized difference-based weighted mixed almost unbiased ridge estimator in partially linear models," Statistical Papers, Springer, vol. 60(5), pages 1717-1739, October.
    19. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    20. Chang, Hung-Hao & Nayga Jr., Rodolfo M., 2011. "Mother's nutritional label use and children's body weight," Food Policy, Elsevier, vol. 36(2), pages 171-178, April.
    21. Pablo Picardo, 2019. "Predicción de precios de vivienda: Aprendizaje estadístico con datos de oferta y transacciones para la ciudad de Montevideo," Documentos de trabajo 2019002, Banco Central del Uruguay.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:bka76. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.