J. Guillermo Llorente () (Universidad Autonoma de Madrid) J. del Hoyo () (Universidad Autonoma de Madrid)
Abstract
Specification analysis precedes model selection for structural analysis or forecasting. To explain a variable, one chooses an optimal subset of k predictors among m indicated variables, often maximizing some goodness of fit or R^2 (or F ). Without such a process, one has potentially misleading data mining. Foster et al. (1997) use maximum R^2 to for this purpose. They feel proper cut-off points of the R^2 distribution require consideration of the selection procedure and hence the use of the distribution function of the maximal R^2 . This difficult function must either be simulated by Monte Carlo or approximated as in Foster et al. with Bonferroni or Rencher and Pun bounds. White (1997) proposes using a 'Reality Check,' comparing forecasting performance of the candidate against a benchmark. Out-of-sample prediction is a good performance test, but choosing the benchmark model is more difficult. Surprisingly the full sample is not often exploited in testing for data mining. We argue that testing with both full sample and recursive estimation along the sample reduces data mining problems. Before accepting a model with significant global R^2 , it is of use to test for coefficient stability and significance of R^2 along the full sample. A sound theoretical model should remain valid if estimated and tested recursively. Foster et al. use R^2 estimated with the full sample. But models may comply with maximal R^2 statistics and be spurious (nonconstant coefficients). We propose to consider the information from the recursive estimations to detect this situation. We add to the processes of model selection and data mining possible parameter variation, which can bias the choice of benchmark model or the specification search among the m variables. Time-varying parameters (TVP) that are assumed constant produce misspecification error, possibly contaminating subsequent analyses. Thus, del Hoyo and Llorente (1998a) study the improvement in forecasting arising by considering non constant parameters. We consider both means (discrimination and stability) for decreasing biases in choosing a model. The first stage uses the R^2 or R^2_{max} to select the optimal explanatory variables. The second stage tests stability and constancy of the relationship. The conditional distributions of the recursive statistics are tabulated, conditional on the discrimination stage. The innovation here is the sequential consideration of both procedures. Section 1 introduces the problem. Section 2 tabulates the distributions of the relevant statistics, and their size and power are considered. Section 3 introduces the sequential procedure described above. The conditional distributions are studied. Section 5 gives an illustration with a model proposed by Campbell, Grossman and Wang (1993). Section 6 concludes.
Download Info
To download:
If you experience problems downloading a file, check if you have the
proper application to
view it first. Information about this may be contained
in the File-Format links below. In case of further problems read
the IDEAS help
page. Note that these files are not on the IDEAS
site. Please be patient as the files may be large.
References listed on IDEAS Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.: