IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v204y2025ics0167947324001750.html
   My bibliography  Save this article

Lost in the shuffle: Testing power in the presence of errorful network vertex labels

Author

Listed:
  • Saxena, Ayushi
  • Lyzinski, Vince

Abstract

Two-sample network hypothesis testing is an important inference task with applications across diverse fields such as medicine, neuroscience, and sociology. Many of these testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. This assumption is often untrue, and the power of the subsequent test can degrade when there are misaligned/label-shuffled vertices across networks. This power loss due to shuffling is theoretically explored in the context of random dot product and stochastic block model networks for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where the power loss across multiple recently proposed tests in the literature is considered. Lastly, the impact that shuffling can have in real-data testing is demonstrated in a pair of examples from neuroscience and from social network analysis.

Suggested Citation

  • Saxena, Ayushi & Lyzinski, Vince, 2025. "Lost in the shuffle: Testing power in the presence of errorful network vertex labels," Computational Statistics & Data Analysis, Elsevier, vol. 204(C).
  • Handle: RePEc:eee:csdana:v:204:y:2025:i:c:s0167947324001750
    DOI: 10.1016/j.csda.2024.108091
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324001750
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.108091?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. repec:plo:pone00:0136497 is not listed on IDEAS
    2. Daniele Durante & David B. Dunson & Joshua T. Vogelstein, 2017. "Nonparametric Bayes Modeling of Populations of Networks," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1516-1530, October.
    3. Hunter, David R. & Goodreau, Steven M. & Handcock, Mark S., 2008. "Goodness of Fit of Social Network Models," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 248-258, March.
    4. Joshua Vogelstein & Carey Priebe, 2015. "Shuffled Graph Classification: Theory and Connectome Applications," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 3-20, April.
    5. Tianxi Li & Elizaveta Levina & Ji Zhu, 2020. "Network cross-validation by edge sampling," Biometrika, Biometrika Trust, vol. 107(2), pages 257-276.
    6. Yoder, Jordan & Chen, Li & Pao, Henry & Bridgeford, Eric & Levin, Keith & Fishkind, Donniell E. & Priebe, Carey & Lyzinski, Vince, 2020. "Vertex nomination: The canonical sampling and the extended spectral nomination schemes," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
    7. Russo, Massimiliano & Durante, Daniele & Scarpa, Bruno, 2018. "Bayesian inference on group differences in multivariate categorical data," Computational Statistics & Data Analysis, Elsevier, vol. 126(C), pages 136-149.
    8. Daniele Durante & David B. Dunson & Joshua T. Vogelstein, 2017. "Rejoinder: Nonparametric Bayes Modeling of Populations of Networks," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1547-1552, October.
    9. Zhu, Mu & Ghodsi, Ali, 2006. "Automatic dimensionality selection from the scree plot via the use of profile likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 918-930, November.
    10. Li Chen & Jie Zhou & Lizhen Lin, 2023. "Hypothesis testing for populations of networks," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 52(11), pages 3661-3684, June.
    11. Daniel L. Sussman & Minh Tang & Donniell E. Fishkind & Carey E. Priebe, 2012. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1119-1128, September.
    12. Patrick Rubin‐Delanchy & Joshua Cape & Minh Tang & Carey E. Priebe, 2022. "A statistical interpretation of spectral embedding: The generalised random dot product graph," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1446-1473, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Guodong & Arroyo, Jesús & Athreya, Avanti & Cape, Joshua & Vogelstein, Joshua T. & Park, Youngser & White, Chris & Larson, Jonathan & Yang, Weiwei & Priebe, Carey E., 2025. "Multiple network embedding for anomaly detection in time series of graphs," Computational Statistics & Data Analysis, Elsevier, vol. 203(C).
    2. Chung, Jaewon & Bridgeford, Eric & Arroyo, Jesus & Pedigo, Benjamin D. & Saad-Eldin, Ali & Gopalakrishnan, Vivek & Xiang, Liang & Priebe, Carey E. & Vogelstein, Joshua T., 2020. "Statistical Connectomics," OSF Preprints ek4n3, Center for Open Science.
    3. Vainora, J., 2024. "Latent Position-Based Modeling of Parameter Heterogeneity," Cambridge Working Papers in Economics 2455, Faculty of Economics, University of Cambridge.
    4. repec:osf:osfxxx:ek4n3_v1 is not listed on IDEAS
    5. Linardi, Fernando & Diks, Cees & van der Leij, Marco & Lazier, Iuri, 2020. "Dynamic interbank network analysis using latent space models," Journal of Economic Dynamics and Control, Elsevier, vol. 112(C).
    6. Silvia D'Angelo & Marco Alfò & Thomas Brendan Murphy, 2020. "Modeling node heterogeneity in latent space models for multidimensional networks," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 74(3), pages 324-341, August.
    7. Laleh Tafakori & Armin Pourkhanali & Riccardo Rastelli, 2022. "Measuring systemic risk and contagion in the European financial network," Empirical Economics, Springer, vol. 63(1), pages 345-389, July.
    8. Yoder, Jordan & Chen, Li & Pao, Henry & Bridgeford, Eric & Levin, Keith & Fishkind, Donniell E. & Priebe, Carey & Lyzinski, Vince, 2020. "Vertex nomination: The canonical sampling and the extended spectral nomination schemes," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
    9. Patrick Rubin‐Delanchy & Joshua Cape & Minh Tang & Carey E. Priebe, 2022. "A statistical interpretation of spectral embedding: The generalised random dot product graph," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1446-1473, September.
    10. Ilenia Lovato & Alessia Pini & Aymeric Stamm & Maxime Taquet & Simone Vantini, 2021. "Multiscale null hypothesis testing for network‐valued data: Analysis of brain networks of patients with autism," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(2), pages 372-397, March.
    11. Lovato, Ilenia & Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2020. "Model-free two-sample test for network-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    12. Wang, Chao & Zhang, Xuemei & Hu, Xiaoqian & Lim, Ming K. & Xu, Yuanhong & Chang, Ping-Chen & Ghadimi, Pezhman, 2025. "Dynamics and drivers of global secondhand clothing trade: Implications for sustainable energy and circular economy in fashion," Renewable and Sustainable Energy Reviews, Elsevier, vol. 209(C).
    13. Shin Ji-Hyung & Infante-Rivard Claire & Graham Jinko & McNeney Brad, 2012. "Adjusting for Spurious Gene-by-Environment Interaction Using Case-Parent Triads," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-23, January.
    14. Ding, Yi & Li, Yingying & Liu, Guoli & Zheng, Xinghua, 2024. "Stock co-jump networks," Journal of Econometrics, Elsevier, vol. 239(2).
    15. Sándor Juhász, 2021. "Spinoffs and tie formation in cluster knowledge networks," Small Business Economics, Springer, vol. 56(4), pages 1385-1404, April.
    16. Yuan, Quan & Liu, Binghui, 2021. "Community detection via an efficient nonconvex optimization approach based on modularity," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    17. Ronaldo F. Zampolo & Frederico H. R. Lopes & Rodrigo M. S. de Oliveira & Martim F. Fernandes & Victor Dmitriev, 2024. "Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets," Energies, MDPI, vol. 17(23), pages 1-18, November.
    18. Jochmans, Koen, 2024. "Nonparametric identification and estimation of stochastic block models from many small networks," Journal of Econometrics, Elsevier, vol. 242(2).
    19. Kayvan Sadeghi & Alessandro Rinaldo, 2020. "Hierarchical models for independence structures of networks," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 74(3), pages 439-457, August.
    20. John McLevey & Alexander V. Graham & Reid McIlroy-Young & Pierson Browne & Kathryn S. Plaisance, 2018. "Interdisciplinarity and insularity in the diffusion of knowledge: an analysis of disciplinary boundaries between philosophy of science and the sciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 331-349, October.
    21. Arno de Caigny & Kristof Coussement & Koen W. de Bock & Stefan Lessmann, 2019. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," Post-Print hal-02275958, HAL.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:204:y:2025:i:c:s0167947324001750. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.