Principal curves have been defined Hastie and Stuetzle (JASA, 1989) as smooth curves passing through the middle of a multidimensional data set. They are nonlinear generalizations of the first principal component, a characterization of which is the basis for the principal curves definition. In this paper we propose an alternative approach based on a different property of principal components. Consider a point in the space where a multivariate normal is defined and, for each hyperplane containing that point, compute the total variance of the normal distribution conditioned to belong to that hyperplane. Choose now the hyperplane minimizing this conditional total variance and look for the corresponding conditional mean. The first principal component of the original distribution passes by this conditional mean and it is orthogonal to that hyperplane. This property is easily generalized to data sets with nonlinear structure. Repeating the search from different starting points, many points analogous to conditional means are found. We call them principal oriented points. When a one-dimensional curve runs the set of these special points it is called principal curve of oriented points. Successive principal curves are recursively defined from a generalization of the total variance.
Download Info
To download:
If you experience problems downloading a file, check if you have the
proper application to
view it first. Information about this may be contained
in the File-Format links below. In case of further problems read
the IDEAS help
file. Note that these files are not on the IDEAS
site. Please be patient as the files may be large.
Publisher Info
Paper provided by Department of Economics and Business, Universitat Pompeu Fabra in its series Economics Working Papers with number
309.
Find related papers by JEL classification: C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: General - - - General C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: General - - - Semiparametric and Nonparametric Methods