IDEAS home Printed from https://ideas.repec.org/p/gwc/wpaper/2008-008.html
   My bibliography  Save this paper

An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Author

Listed:
  • Antony Davies

    (Department of Economics Duquesne University The Mercatus Center George Mason University)

Abstract

Regression analysis is intended to be used when the researcher seeks to test a given hypothesis against a data set. Unfortunately, in many applications it is either not possible to specify a hypothesis, typically because the research is in a very early stage, or it is not desirable to form a hypothesis, typically because the number of potential explanatory variables is very large. In these cases, researchers have resorted either to overt data mining techniques such as stepwise regression, or covert data mining techniques such as running variations on regression models prior to running the final model (also known as “data peeking”). While data mining side-steps the need to form a hypothesis, it is highly susceptible to generating spurious results. This paper draws on the known properties of OLS estimators in the presence of omitted and extraneous variable models to propose a procedure for data mining that attempts to distinguish between parameter estimates that are significant due to an underlying structural relationship and those that are significant due to random chance.

Suggested Citation

  • Antony Davies, 2008. "An Exploration of Regression-Based Data Mining Techniques Using Super Computation," Working Papers 2008-008, The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting.
  • Handle: RePEc:gwc:wpaper:2008-008
    as

    Download full text from publisher

    File URL: https://www2.gwu.edu/~forcpgm/2008-008.pdf
    File Function: First version, 2008
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yatchew, Adonis & Griliches, Zvi, 1985. "Specification Error in Probit Models," The Review of Economics and Statistics, MIT Press, vol. 67(1), pages 134-139, February.
    2. Davies, Antony, 2006. "A framework for decomposing shocks and measuring volatilities derived from multi-dimensional panel data of survey forecasts," International Journal of Forecasting, Elsevier, vol. 22(2), pages 373-393.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Campbell, Randall C. & Nagel, Gregory L., 2016. "Private information and limitations of Heckman's estimator in banking and corporate finance research," Journal of Empirical Finance, Elsevier, vol. 37(C), pages 186-195.
    2. Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2019. "Language skills and homophilous hiring discrimination: Evidence from gender and racially differentiated applications," Review of Economics of the Household, Springer, vol. 17(1), pages 349-376, March.
    3. Bedri Kamil Onur Taş, 2016. "Does the Federal Reserve have Private Information about its Future Actions?," Economica, London School of Economics and Political Science, vol. 83(331), pages 498-517, July.
    4. Richard Williams, 2009. "Using Heterogeneous Choice Models to Compare Logit and Probit Coefficients Across Groups," Sociological Methods & Research, , vol. 37(4), pages 531-559, May.
    5. Hoetker, Glenn, 2004. "Confounded Coefficients: Accurately Comparing Logit and Probit Coefficients across Groups," Working Papers 03-0100, University of Illinois at Urbana-Champaign, College of Business.
    6. Charlie Tchinda & Marcus Dejardin, 2021. "Are Business Policy Measures in Response to the COVID-19 Pandemic to Be Equally Valued? An Exploration According to SMEs Owners’ Business Expectations," Sustainability, MDPI, vol. 13(21), pages 1-42, October.
    7. Joshua Lospinoso & Michael Schweinberger & Tom Snijders & Ruth Ripley, 2011. "Assessing and accounting for time heterogeneity in stochastic actor oriented models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(2), pages 147-176, July.
    8. Pedro Garcia‐del‐Barrio & Pablo Agnese, 2023. "To comply or not to comply? How a UEFA wage‐to‐revenue requirement might affect the sport and managerial performance of soccer clubs," Managerial and Decision Economics, John Wiley & Sons, Ltd., vol. 44(2), pages 767-786, March.
    9. Ángela González Arbeláez, 2010. "Determinantes del riesgo del crédito comercial en Colombia," Vniversitas Económica 8215, Universidad Javeriana - Bogotá.
    10. Wixe, Sofia & Nilsson, Pia & Naldi, Lucia & Westlund, Hans, 2017. "Disentangling Innovation in Small Food Firms: The role of External Knowledge, Support, and Collaboration," Working Paper Series in Economics and Institutions of Innovation 446, Royal Institute of Technology, CESIS - Centre of Excellence for Science and Innovation Studies.
    11. Aslund, O., 2000. "Immigrant Settlement Policies and Subsequent Migration," Papers 2000-23, Uppsala - Working Paper Series.
    12. Carlsson, Fredrik & Johansson-Stenman, Olof, 2006. "Should We Trust Hypothetical Referenda? Test and Identification Problems," Working Papers in Economics 189, University of Gothenburg, Department of Economics, revised 24 Jan 2006.
    13. Richard T. Boylan, 2012. "The Effect of Punishment Severity on Plea Bargaining," Journal of Law and Economics, University of Chicago Press, vol. 55(3), pages 565-591.
    14. Oriana Bandiera, 1999. "On the Structure of Tenancy contracts: Theory and Evidence fron 19th Century Rural Sicily," STICERD - Development Economics Papers - From 2008 this series has been superseded by Economic Organisation and Public Policy Discussion Papers 19, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    15. Reback, RandallRandall, 2004. "The impact of college course offerings on the supply of academically talented public school teachers," Journal of Econometrics, Elsevier, vol. 121(1-2), pages 377-404.
    16. Arduini, Davide & Belotti, Federico & Denni, Mario & Giungato, Gerolamo & Zanfei, Antonello, 2010. "Technology adoption and innovation in public services the case of e-government in Italy," Information Economics and Policy, Elsevier, vol. 22(3), pages 257-275, July.
    17. William Reed, 2003. "Information and Economic Interdependence," Journal of Conflict Resolution, Peace Science Society (International), vol. 47(1), pages 54-71, February.
    18. Wu, Fang & Swait, Joffre & Chen, Yuxin, 2019. "Feature-based attributes and the roles of consumers' perception bias and inference in choice," International Journal of Research in Marketing, Elsevier, vol. 36(2), pages 325-340.
    19. Terry N. Flynn & Elisabeth Huynh & Tim J. Peters & Hareth Al‐Janabi & Sam Clemens & Alison Moody & Joanna Coast, 2015. "Scoring the Icecap‐a Capability Instrument. Estimation of a UK General Population Tariff," Health Economics, John Wiley & Sons, Ltd., vol. 24(3), pages 258-269, March.
    20. Luiz Alberto Esteves, 2009. "O papel da produção de conhecimento tecnológico na internacionalização das empresas industriais brasileiras," Working Papers 0090, Universidade Federal do Paraná, Department of Economics.

    More about this item

    Keywords

    exhaustive; regression; all subsets; stepwise; data mining;
    All these keywords.

    JEL classification:

    • C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - General
    • C40 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - General
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gwc:wpaper:2008-008. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: GW Economics Department (email available below). General contact details of provider: https://edirc.repec.org/data/pfgwuus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.