IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v8y2020i2p32-d341113.html
   My bibliography  Save this article

A Note on Combining Machine Learning with Statistical Modeling for Financial Data Analysis

Author

Listed:
  • José María Sarabia

    (Department of Economics, University of Cantabria, 39005 Santander, Spain)

  • Faustino Prieto

    (Department of Economics, University of Cantabria, 39005 Santander, Spain)

  • Vanesa Jordá

    (Department of Economics, University of Cantabria, 39005 Santander, Spain)

  • Stefan Sperlich

    (Geneva School of Economics and Management, University of Geneva, 1211 Geneva, Switzerland)

Abstract

This note revisits the ideas of the so-called semiparametric methods that we consider to be very useful when applying machine learning in insurance. To this aim, we first recall the main essence of semiparametrics like the mixing of global and local estimation and the combining of explicit modeling with purely data adaptive inference. Then, we discuss stepwise approaches with different ways of integrating machine learning. Furthermore, for the modeling of prior knowledge, we introduce classes of distribution families for financial data. The proposed procedures are illustrated with data on stock returns for five companies of the Spanish value-weighted index IBEX35.

Suggested Citation

  • José María Sarabia & Faustino Prieto & Vanesa Jordá & Stefan Sperlich, 2020. "A Note on Combining Machine Learning with Statistical Modeling for Financial Data Analysis," Risks, MDPI, vol. 8(2), pages 1-14, April.
  • Handle: RePEc:gam:jrisks:v:8:y:2020:i:2:p:32-:d:341113
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/8/2/32/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/8/2/32/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Max Köhler & Anja Schindler & Stefan Sperlich, 2014. "A Review and Comparison of Bandwidth Selection Methods for Kernel Regression," International Statistical Review, International Statistical Institute, vol. 82(2), pages 243-274, August.
    2. Wenceslao González-Manteiga & Rosa Crujeiras, 2013. "Rejoinder on: An updated review of Goodness-of-Fit tests for regression models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 22(3), pages 442-447, September.
    3. Panayiotis Theodossiou, 1998. "Financial Data and the Skewed Generalized T Distribution," Management Science, INFORMS, vol. 44(12-Part-1), pages 1650-1661, December.
    4. Nielsen, Jens Perch & Sperlich, Stefan, 2003. "Prediction of Stock Returns: A New Way to Look at It," ASTIN Bulletin, Cambridge University Press, vol. 33(2), pages 399-417, November.
    5. Scholz, Michael & Sperlich, Stefan & Nielsen, Jens Perch, 2016. "Nonparametric long term prediction of stock returns with generated bond yields," Insurance: Mathematics and Economics, Elsevier, vol. 69(C), pages 82-96.
    6. Zhu, Dongming & Galbraith, John W., 2010. "A generalized asymmetric Student-t distribution with application to financial econometrics," Journal of Econometrics, Elsevier, vol. 157(2), pages 297-305, August.
    7. Alexander, Carol & Cordeiro, Gauss M. & Ortega, Edwin M.M. & Sarabia, José María, 2012. "Generalized beta-generated distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1880-1897.
    8. Enno Mammen & Jens Perch Nielsen & Michael Scholz & Stefan Sperlich, 2019. "Conditional Variance Forecasts for Long-Term Stock Returns," Risks, MDPI, vol. 7(4), pages 1-22, November.
    9. Wenceslao González-Manteiga & Rosa Crujeiras, 2013. "An updated review of Goodness-of-Fit tests for regression models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 22(3), pages 361-411, September.
    10. Scholz, Michael & Nielsen, Jens Perch & Sperlich, Stefan, 2015. "Nonparametric prediction of stock returns based on yearly data: The long-term view," Insurance: Mathematics and Economics, Elsevier, vol. 65(C), pages 143-155.
    11. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506.
    12. Jing Dai & Stefan Sperlich & Walter Zucchini, 2016. "A Simple Method for Predicting Distributions by Means of Covariates with Examples from Poverty and Health Economics," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 152(1), pages 49-80, January.
    13. Nils-Bastian Heidenreich & Anja Schindler & Stefan Sperlich, 2013. "Bandwidth selection for kernel density estimation: a review of fully automatic selectors," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 403-433, October.
    14. Enno Mammen & Jens Perch Nielsen & Michael Scholz & Stefan Sperlich, 2019. "Conditional variance forecasts for long-term stock returns," Graz Economics Papers 2019-08, University of Graz, Department of Economics.
    15. Adelchi Azzalini & Antonella Capitanio, 2003. "Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 367-389, May.
    16. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167.
    17. M. Jones, 2004. "Families of distributions arising from distributions of order statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 13(1), pages 1-43, June.
    18. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    19. Bollerslev, Tim, 1987. "A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return," The Review of Economics and Statistics, MIT Press, vol. 69(3), pages 542-547, August.
    20. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vali Asimit & Ioannis Kyriakou & Jens Perch Nielsen, 2020. "Special Issue “Machine Learning in Insurance”," Risks, MDPI, vol. 8(2), pages 1-2, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ioannis Kyriakou & Parastoo Mousavi & Jens Perch Nielsen & Michael Scholz, 2020. "Longer-Term Forecasting of Excess Stock Returns—The Five-Year Case," Mathematics, MDPI, vol. 8(6), pages 1-20, June.
    2. Ioannis Kyriakou & Parastoo Mousavi & Jens Perch Nielsen & Michael Scholz, 2021. "Short-Term Exuberance and Long-Term Stability: A Simultaneous Optimization of Stock Return Predictions for Short and Long Horizons," Mathematics, MDPI, vol. 9(6), pages 1-19, March.
    3. Parastoo Mousavi, 2021. "Debt-by-Price Ratio, End-of-Year Economic Growth, and Long-Term Prediction of Stock Returns," Mathematics, MDPI, vol. 9(13), pages 1-18, July.
    4. Enno Mammen & Jens Perch Nielsen & Michael Scholz & Stefan Sperlich, 2019. "Conditional Variance Forecasts for Long-Term Stock Returns," Risks, MDPI, vol. 7(4), pages 1-22, November.
    5. Scholz, Michael & Nielsen, Jens Perch & Sperlich, Stefan, 2015. "Nonparametric prediction of stock returns based on yearly data: The long-term view," Insurance: Mathematics and Economics, Elsevier, vol. 65(C), pages 143-155.
    6. Ioannis Kyriakou & Parastoo Mousavi & Jens Perch Nielsen & Michael Scholz, 2020. "Short-Term Exuberance and long-term stability: A simultaneous optimization of stock return predictions for short and long horizons," Graz Economics Papers 2020-20, University of Graz, Department of Economics.
    7. Gerrard, Russell & Hiabu, Munir & Nielsen, Jens Perch & Vodička, Peter, 2020. "Long-term real dynamic investment planning," Insurance: Mathematics and Economics, Elsevier, vol. 92(C), pages 90-103.
    8. Dongming Zhu & John W. Galbraith, 2009. "Forecasting Expected Shortfall with a Generalized Asymmetric Student-t Distribution," CIRANO Working Papers 2009s-24, CIRANO.
    9. M. C. Jones, 2015. "On Families of Distributions with Shape Parameters," International Statistical Review, International Statistical Institute, vol. 83(2), pages 175-192, August.
    10. Ibrahim Ergen, 2015. "Two-step methods in VaR prediction and the importance of fat tails," Quantitative Finance, Taylor & Francis Journals, vol. 15(6), pages 1013-1030, June.
    11. Zhu, Dongming & Galbraith, John W., 2011. "Modeling and forecasting expected shortfall with the generalized asymmetric Student-t and asymmetric exponential power distributions," Journal of Empirical Finance, Elsevier, vol. 18(4), pages 765-778, September.
    12. Otto-Sobotka, Fabian & Salvati, Nicola & Ranalli, Maria Giovanna & Kneib, Thomas, 2019. "Adaptive semiparametric M-quantile regression," Econometrics and Statistics, Elsevier, vol. 11(C), pages 116-129.
    13. Timothy K.M. Beatty & Erling Røed Larsen, 2005. "Using Engel curves to estimate bias in the Canadian CPI as a cost of living index," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 38(2), pages 482-499, May.
    14. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    15. Inés Barbeito & Ricardo Cao & Stefan Sperlich, 2023. "Bandwidth selection for statistical matching and prediction," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(1), pages 418-446, March.
    16. Hyunju Son & Youyi Fong, 2021. "Fast grid search and bootstrap‐based inference for continuous two‐phase polynomial regression models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(3), May.
    17. Michael Wegener & Göran Kauermann, 2017. "Forecasting in nonlinear univariate time series using penalized splines," Statistical Papers, Springer, vol. 58(3), pages 557-576, September.
    18. Dlugosz, Stephan & Mammen, Enno & Wilke, Ralf A., 2017. "Generalized partially linear regression with misclassified data and an application to labour market transitions," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 145-159.
    19. Bernhard Baumgartner & Daniel Guhl & Thomas Kneib & Winfried J. Steiner, 2018. "Flexible estimation of time-varying effects for frequently purchased retail goods: a modeling approach based on household panel data," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 40(4), pages 837-873, October.
    20. Zi Ye & Giles Hooker & Stephen P. Ellner, 2021. "Generalized Single Index Models and Jensen Effects on Reproduction and Survival," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(3), pages 492-512, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:8:y:2020:i:2:p:32-:d:341113. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.