IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v84y2022i4p1129-1149.html
   My bibliography  Save this article

Efficient manifold approximation with spherelets

Author

Listed:
  • Didong Li
  • Minerva Mukhopadhyay
  • David B. Dunson

Abstract

In statistical dimensionality reduction, it is common to rely on the assumption that high dimensional data tend to concentrate near a lower dimensional manifold. There is a rich literature on approximating the unknown manifold, and on exploiting such approximations in clustering, data compression, and prediction. Most of the literature relies on linear or locally linear approximations. In this article, we propose a simple and general alternative, which instead uses spheres, an approach we refer to as spherelets. We develop spherical principal components analysis (SPCA), and provide theory on the convergence rate for global and local SPCA, while showing that spherelets can provide lower covering numbers and mean squared errors for many manifolds. Results relative to state‐of‐the‐art competitors show gains in ability to accurately approximate manifolds with fewer components. Unlike most competitors, which simply output lower‐dimensional features, our approach projects data onto the estimated manifold to produce fitted values that can be used for model assessment and cross validation. The methods are illustrated with applications to multiple data sets.

Suggested Citation

  • Didong Li & Minerva Mukhopadhyay & David B. Dunson, 2022. "Efficient manifold approximation with spherelets," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1129-1149, September.
  • Handle: RePEc:bla:jorssb:v:84:y:2022:i:4:p:1129-1149
    DOI: 10.1111/rssb.12508
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssb.12508
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssb.12508?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    2. J. Kruskal, 1964. "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis," Psychometrika, Springer;The Psychometric Society, vol. 29(1), pages 1-27, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roger Shepard, 1974. "Representation of structure in similarity data: Problems and prospects," Psychometrika, Springer;The Psychometric Society, vol. 39(4), pages 373-421, December.
    2. Giovanna Boccuzzo & Licia Maron, 2017. "Proposal of a composite indicator of job quality based on a measure of weighted distances," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(5), pages 2357-2374, September.
    3. Puyi Fang & Zhaoxing Gao & Ruey S. Tsay, 2023. "Determination of the effective cointegration rank in high-dimensional time-series predictive regressions," Papers 2304.12134, arXiv.org, revised Apr 2023.
    4. Candelon, B. & Hurlin, C. & Tokpavi, S., 2012. "Sampling error and double shrinkage estimation of minimum variance portfolios," Journal of Empirical Finance, Elsevier, vol. 19(4), pages 511-527.
    5. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.
    6. Jong-Seok Lee & Dan Zhu, 2012. "Shilling Attack Detection---A New Approach for a Trustworthy Recommender System," INFORMS Journal on Computing, INFORMS, vol. 24(1), pages 117-131, February.
    7. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    8. Asai, Manabu & McAleer, Michael, 2015. "Forecasting co-volatilities via factor models with asymmetry and long memory in realized covariance," Journal of Econometrics, Elsevier, vol. 189(2), pages 251-262.
    9. Ján Kulfan & Lenka Sarvašová & Michal Parák & Marek Dzurenko & Peter Zach, 2018. "Can late flushing trees avoid attack by moth larvae in temperate forests?," Plant Protection Science, Czech Academy of Agricultural Sciences, vol. 54(4), pages 272-283.
    10. Ma, Jie & Tse, Ying Kei & Wang, Xiaojun & Zhang, Minhao, 2019. "Examining customer perception and behaviour through social media research – An empirical study of the United Airlines overbooking crisis," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 127(C), pages 192-205.
    11. Muñoz-Mas, Rafael & Vezza, Paolo & Alcaraz-Hernández, Juan Diego & Martínez-Capel, Francisco, 2016. "Risk of invasion predicted with support vector machines: A case study on northern pike (Esox Lucius, L.) and bleak (Alburnus alburnus, L.)," Ecological Modelling, Elsevier, vol. 342(C), pages 123-134.
    12. Ivan Mihál & Eva Luptáková & Martin Pavlík, 2021. "Wood-inhabiting macromycete communities in spruce stands on former agricultural land," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 67(2), pages 51-65.
    13. Maillet, Bertrand & Tokpavi, Sessi & Vaucher, Benoit, 2015. "Global minimum variance portfolio optimisation under some model risk: A robust regression-based approach," European Journal of Operational Research, Elsevier, vol. 244(1), pages 289-299.
    14. Venera Tomaselli, 1996. "Multivariate statistical techniques and sociological research," Quality & Quantity: International Journal of Methodology, Springer, vol. 30(3), pages 253-276, August.
    15. Simensen, Trond & Halvorsen, Rune & Erikstad, Lars, 2018. "Methods for landscape characterisation and mapping: A systematic review," Land Use Policy, Elsevier, vol. 75(C), pages 557-569.
    16. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    17. Marie Diekmann & Ludwig Theuvsen, 2019. "Value structures determining community supported agriculture: insights from Germany," Agriculture and Human Values, Springer;The Agriculture, Food, & Human Values Society (AFHVS), vol. 36(4), pages 733-746, December.
    18. Namvar, Ethan & Phillips, Blake & Pukthuanthong, Kuntara & Raghavendra Rau, P., 2016. "Do hedge funds dynamically manage systematic risk?," Journal of Banking & Finance, Elsevier, vol. 64(C), pages 1-15.
    19. Li, Weiming & Gao, Jing & Li, Kunpeng & Yao, Qiwei, 2016. "Modelling multivariate volatilities via latent common factors," LSE Research Online Documents on Economics 68121, London School of Economics and Political Science, LSE Library.
    20. Silin, Igor & Spokoiny, Vladimir, 2018. "Bayesian inference for spectral projectors of covariance matrix," IRTG 1792 Discussion Papers 2018-027, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:84:y:2022:i:4:p:1129-1149. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.