IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v84y2022i4p1129-1149.html

Efficient manifold approximation with spherelets

Author

Listed:
  • Didong Li
  • Minerva Mukhopadhyay
  • David B. Dunson

Abstract

In statistical dimensionality reduction, it is common to rely on the assumption that high dimensional data tend to concentrate near a lower dimensional manifold. There is a rich literature on approximating the unknown manifold, and on exploiting such approximations in clustering, data compression, and prediction. Most of the literature relies on linear or locally linear approximations. In this article, we propose a simple and general alternative, which instead uses spheres, an approach we refer to as spherelets. We develop spherical principal components analysis (SPCA), and provide theory on the convergence rate for global and local SPCA, while showing that spherelets can provide lower covering numbers and mean squared errors for many manifolds. Results relative to state‐of‐the‐art competitors show gains in ability to accurately approximate manifolds with fewer components. Unlike most competitors, which simply output lower‐dimensional features, our approach projects data onto the estimated manifold to produce fitted values that can be used for model assessment and cross validation. The methods are illustrated with applications to multiple data sets.

Suggested Citation

  • Didong Li & Minerva Mukhopadhyay & David B. Dunson, 2022. "Efficient manifold approximation with spherelets," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1129-1149, September.
  • Handle: RePEc:bla:jorssb:v:84:y:2022:i:4:p:1129-1149
    DOI: 10.1111/rssb.12508
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssb.12508
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssb.12508?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    2. J. Kruskal, 1964. "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis," Psychometrika, Springer;The Psychometric Society, vol. 29(1), pages 1-27, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roger Shepard, 1974. "Representation of structure in similarity data: Problems and prospects," Psychometrika, Springer;The Psychometric Society, vol. 39(4), pages 373-421, December.
    2. Giovanna Boccuzzo & Licia Maron, 2017. "Proposal of a composite indicator of job quality based on a measure of weighted distances," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(5), pages 2357-2374, September.
    3. Busch, Christin & Specht, Kathrin & Inostroza, Luis & Falke, Matthias & Zepp, Harald, 2024. "Disentangling cultural ecosystem services co-production in urban green spaces through social media reviews," Ecosystem Services, Elsevier, vol. 70(C).
    4. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    5. Ma, Jie & Tse, Ying Kei & Wang, Xiaojun & Zhang, Minhao, 2019. "Examining customer perception and behaviour through social media research – An empirical study of the United Airlines overbooking crisis," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 127(C), pages 192-205.
    6. Muñoz-Mas, Rafael & Vezza, Paolo & Alcaraz-Hernández, Juan Diego & Martínez-Capel, Francisco, 2016. "Risk of invasion predicted with support vector machines: A case study on northern pike (Esox Lucius, L.) and bleak (Alburnus alburnus, L.)," Ecological Modelling, Elsevier, vol. 342(C), pages 123-134.
    7. Ivan Mihál & Eva Luptáková & Martin Pavlík, 2021. "Wood-inhabiting macromycete communities in spruce stands on former agricultural land," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 67(2), pages 51-65.
    8. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    9. Marie Diekmann & Ludwig Theuvsen, 2019. "Value structures determining community supported agriculture: insights from Germany," Agriculture and Human Values, Springer;The Agriculture, Food, & Human Values Society (AFHVS), vol. 36(4), pages 733-746, December.
    10. Silin, Igor & Spokoiny, Vladimir, 2018. "Bayesian inference for spectral projectors of covariance matrix," IRTG 1792 Discussion Papers 2018-027, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    11. Barigozzi, Matteo & Trapani, Lorenzo, 2020. "Sequential testing for structural stability in approximate factor models," Stochastic Processes and their Applications, Elsevier, vol. 130(8), pages 5149-5187.
    12. D. V. Pahan Prasada, 2013. "Domestic versus Multilateral Institutions in Bilateral Trade: A Comparative Gravity Analysis," International Economic Journal, Taylor & Francis Journals, vol. 27(1), pages 127-142, March.
    13. Steland, Ansgar, 2020. "Testing and estimating change-points in the covariance matrix of a high-dimensional time series," Journal of Multivariate Analysis, Elsevier, vol. 177(C).
    14. Malcolm Dow & Peter Willett & Roderick McDonald & Belver Griffith & Michael Greenacre & Peter Bryant & Daniel Wartenberg & Ove Frank, 1987. "Book reviews," Journal of Classification, Springer;The Classification Society, vol. 4(2), pages 245-278, September.
    15. Mark Davison, 1988. "A reformulation of the general Euclidean model for the external analysis of preference data," Psychometrika, Springer;The Psychometric Society, vol. 53(3), pages 305-320, September.
    16. Enrico di Bella & Matteo Corsi & Lucia Leporatti, 2015. "A Multi-indicator Approach for Smart Security Policy Making," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 122(3), pages 653-675, July.
    17. Willem Heiser, 1991. "A generalized majorization method for least souares multidimensional scaling of pseudodistances that may be negative," Psychometrika, Springer;The Psychometric Society, vol. 56(1), pages 7-27, March.
    18. Pietro Lovaglio & Mario Mezzanzanica, 2013. "Classification of longitudinal career paths," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(2), pages 989-1008, February.
    19. Lam, Clifford & Yao, Qiwei & Bathia, Neil, 2011. "Estimation of latent factors for high-dimensional time series," LSE Research Online Documents on Economics 31549, London School of Economics and Political Science, LSE Library.
    20. Wang, Zhangyuan & Xu, Wenhua & Han, Xuliang & Li, Ruipeng & Gong, Jiaye & Cui, Weicheng & Fan, Dixia, 2025. "A low cost strategy on energy harvesting of flapping foil with time-warping optimization," Energy, Elsevier, vol. 337(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:84:y:2022:i:4:p:1129-1149. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.