IDEAS home Printed from https://ideas.repec.org/a/taf/oaefxx/v3y2015i1p1045216.html
   My bibliography  Save this article

Statistical model selection with “Big Data”

Author

Listed:
  • Jurgen A. Doornik
  • David F. Hendry
  • Steve Cook

Abstract

Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models.

Suggested Citation

  • Jurgen A. Doornik & David F. Hendry & Steve Cook, 2015. "Statistical model selection with “Big Data”," Cogent Economics & Finance, Taylor & Francis Journals, vol. 3(1), pages 1045216-104, December.
  • Handle: RePEc:taf:oaefxx:v:3:y:2015:i:1:p:1045216
    DOI: 10.1080/23322039.2015.1045216
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/23322039.2015.1045216
    Download Restriction: Access to full text is restricted to subscribers.

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Castle Jennifer L. & Doornik Jurgen A & Hendry David F., 2011. "Evaluating Automatic Model Selection," Journal of Time Series Econometrics, De Gruyter, vol. 3(1), pages 1-33, February.
    2. Hendry, David F. & Massmann, Michael, 2007. "Co-Breaking: Recent Advances and a Synopsis of the Literature," Journal of Business & Economic Statistics, American Statistical Association, vol. 25, pages 33-51, January.
    3. David Hendry & Jurgen A. Doornik & Felix Pretis, 2013. "Step-indicator Saturation," Economics Series Working Papers 658, University of Oxford, Department of Economics.
    4. Hendry, David F. & Johansen, Søren, 2015. "Model Discovery And Trygve Haavelmo’S Legacy," Econometric Theory, Cambridge University Press, vol. 31(01), pages 93-114, February.
    5. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    6. David F. Hendry & Hans-Martin Krolzig, 2005. "The Properties of Automatic "GETS" Modelling," Economic Journal, Royal Economic Society, vol. 115(502), pages 32-61, March.
    7. Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2013. "Model Selection in Equations with Many ‘Small’ Effects," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 75(1), pages 6-22, February.
    8. David F. Hendry & Bent Nielsen, 2007. "Preface to Econometric Modeling: A Likelihood Approach," Introductory Chapters,in: Econometric Modeling: A Likelihood Approach Princeton University Press.
    9. Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2012. "Model selection when there are multiple breaks," Journal of Econometrics, Elsevier, vol. 169(2), pages 239-246.
    10. Carlos Santos & David Hendry & Soren Johansen, 2008. "Automatic selection of indicators in a fully saturated regression," Computational Statistics, Springer, vol. 23(2), pages 317-335, April.
    11. Godfrey, Leslie G, 1978. "Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables," Econometrica, Econometric Society, vol. 46(6), pages 1303-1310, November.
    12. Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
    13. David F. Hendry & Carlos Santos, 2005. "Regression Models with Data-based Indicator Variables," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 67(5), pages 571-595, October.
    14. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    15. Govaerts, Bernadette & Hendry, David F. & Richard, Jean-Francois, 1994. "Encompassing in stationary linear dynamic models," Journal of Econometrics, Elsevier, vol. 63(1), pages 245-270, July.
    16. David Hendry & Carlos Santos, 2010. "An Automatic Test of Super Exogeneity," Economics Series Working Papers 476, University of Oxford, Department of Economics.
    17. Mizon, Grayham E & Richard, Jean-Francois, 1986. "The Encompassing Principle and Its Application to Testing Non-nested Hypotheses," Econometrica, Econometric Society, vol. 54(3), pages 657-678, May.
    18. Castle, Jennifer L. & Hendry, David F., 2010. "A low-dimension portmanteau test for non-linearity," Journal of Econometrics, Elsevier, vol. 158(2), pages 231-245, October.
    19. Salkever, David S., 1976. "The use of dummy variables to compute predictions, prediction errors, and confidence intervals," Journal of Econometrics, Elsevier, vol. 4(4), pages 393-397, November.
    20. Jurgen A. Doornik, 2008. "Encompassing and Automatic Model Selection," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 915-925, December.
    21. David F. Hendry & Bent Nielsen, 2007. "The Bernoulli model, from Econometric Modeling: A Likelihood Approach," Introductory Chapters,in: Econometric Modeling: A Likelihood Approach Princeton University Press.
    22. Engle, Robert F, 1982. "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica, Econometric Society, vol. 50(4), pages 987-1007, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. repec:taf:oaefxx:v:4:y:2016:i:1:p:1170096 is not listed on IDEAS
    2. repec:taf:jnlbes:v:34:y:2016:i:4:p:642-660 is not listed on IDEAS
    3. repec:eee:jeborg:v:154:y:2018:i:c:p:335-361 is not listed on IDEAS
    4. Castañeda, Gonzalo & Chávez-Juárez, Florian & Guerrero, Omar A., 2018. "How do governments determine policy priorities? Studying development strategies through spillover networks," Journal of Economic Behavior & Organization, Elsevier, vol. 154(C), pages 335-361.
    5. Schintler, Laurie A. & Fischer, Manfred M., 2018. "Big Data and Regional Science: Opportunities, Challenges, and Directions for Future Research," Working Papers in Regional Science 6122, WU Vienna University of Economics and Business.

    More about this item

    JEL classification:

    • C51 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Construction and Estimation
    • C22 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Time-Series Models; Dynamic Quantile Regressions; Dynamic Treatment Effect Models; Diffusion Processes

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:oaefxx:v:3:y:2015:i:1:p:1045216. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Chris Longhurst). General contact details of provider: http://www.tandfonline.com/OAEF20 .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.