IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v99y2016icp171-188.html

SMILE: A novel dissimilarity-based procedure for detecting sparse-specific profiles in sparse contingency tables

Author

Listed:
  • Emily, Mathieu
  • Hitte, Christophe
  • Mom, Alain

Abstract

A novel statistical procedure for clustering individuals characterized by sparse-specific profiles is introduced in the context of data summarized in sparse contingency tables. The proposed procedure relies on a single-linkage clustering based on a new dissimilarity measure designed to give equal influence to sparsity and specificity of profiles. Theoretical properties of the new dissimilarity are derived by characterizing single-linkage clustering using Minimum Spanning Trees. Such characterization allows the description of situations for which the proposed dissimilarity outperforms competing dissimilarities. Simulation examples are performed to demonstrate the strength of the new dissimilarity compared to 11 other methods. The analysis of a genomic dataset dedicated to the study of molecular signatures of selection is used to illustrate the efficiency of the proposed method in a real situation.

Suggested Citation

  • Emily, Mathieu & Hitte, Christophe & Mom, Alain, 2016. "SMILE: A novel dissimilarity-based procedure for detecting sparse-specific profiles in sparse contingency tables," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 171-188.
  • Handle: RePEc:eee:csdana:v:99:y:2016:i:c:p:171-188
    DOI: 10.1016/j.csda.2016.01.017
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947316300032
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2016.01.017?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Goslee, Sarah C. & Urban, Dean L., 2007. "The ecodist Package for Dissimilarity-based Analysis of Ecological Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 22(i07).
    2. J. C. Gower & G. J. S. Ross, 1969. "Minimum Spanning Trees and Single Linkage Cluster Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 18(1), pages 54-64, March.
    3. Witten, Daniela M. & Tibshirani, Robert, 2010. "A Framework for Feature Selection in Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 713-726.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roberto Rocci & Maurizio Vichi & Monia Ranalli, 2025. "Mixture models for simultaneous classification and reduction of three-way data," Computational Statistics, Springer, vol. 40(1), pages 469-507, January.
    2. Ronglai Shen & Qianxing Mo & Nikolaus Schultz & Venkatraman E Seshan & Adam B Olshen & Jason Huse & Marc Ladanyi & Chris Sander, 2012. "Integrative Subtype Discovery in Glioblastoma Using iCluster," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-9, April.
    3. Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
    4. Yujia Li & Xiangrui Zeng & Chien‐Wei Lin & George C. Tseng, 2022. "Simultaneous estimation of cluster number and feature sparsity in high‐dimensional cluster analysis," Biometrics, The International Biometric Society, vol. 78(2), pages 574-585, June.
    5. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    6. Ekaterina Dolbunova & Alexandre Lucquin & T. Rowan McLaughlin & Manon Bondetti & Blandine Courel & Ester Oras & Henny Piezonka & Harry K. Robson & Helen Talbot & Kamil Adamczak & Konstantin Andreev & , 2023. "The transmission of pottery technology among prehistoric European hunter-gatherers," Nature Human Behaviour, Nature, vol. 7(2), pages 171-183, February.
    7. Dong Liu & Changwei Zhao & Yong He & Lei Liu & Ying Guo & Xinsheng Zhang, 2023. "Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix‐variate fMRI data," Biometrics, The International Biometric Society, vol. 79(3), pages 2246-2259, September.
    8. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    9. Bartelme, Dominick & Lan, Ting & Levchenko, Andrei A., 2024. "Specialization, market access and real income," Journal of International Economics, Elsevier, vol. 150(C).
    10. Turner, Rachel A. & Polunin, Nicholas V.C. & Stead, Selina M., 2015. "Mapping inshore fisheries: Comparing observed and perceived distributions of pot fishing activity in Northumberland," Marine Policy, Elsevier, vol. 51(C), pages 173-181.
    11. Congmin Zhu & Linwei Wu & Daliang Ning & Renmao Tian & Shuhong Gao & Bing Zhang & Jianshu Zhao & Ya Zhang & Naijia Xiao & Yajiao Wang & Mathew R. Brown & Qichao Tu & Feng Ju & George F. Wells & Jianhu, 2025. "Global diversity and distribution of antibiotic resistance genes in human wastewater treatment systems," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    12. Huaihou Chen & Philip T. Reiss & Thaddeus Tarpey, 2014. "Optimally weighted L-super-2 distance for functional data," Biometrics, The International Biometric Society, vol. 70(3), pages 516-525, September.
    13. Axel Theorell & Yenan Troi Bryceson & Jakob Theorell, 2019. "Determination of essential phenotypic elements of clusters in high-dimensional entities—DEPECHE," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-15, March.
    14. Anna C. Peterson & Himanshu Sharma & Arvind Kumar & Bruno M. Ghersi & Scott J. Emrich & Kurt J. Vandegrift & Amit Kapoor & Michael J. Blum, 2021. "Rodent Virus Diversity and Differentiation across Post-Katrina New Orleans," Sustainability, MDPI, vol. 13(14), pages 1-18, July.
    15. Beibei Zhang & Rong Chen, 2018. "Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic," Journal of Classification, Springer;The Classification Society, vol. 35(3), pages 394-421, October.
    16. Ishrat Z. Anka & Tamsyn M. Uren Webster & Waldir M. Berbel-Filho & Matthew Hitchings & Benjamin Overland & Sarah Weller & Carlos Garcia de Leaniz & Sofia Consuegra, 2024. "Microbiome and epigenetic variation in wild fish with low genetic diversity," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    17. Davide Rassati & Massimo Faccoli & Robert A Haack & Robert J Rabaglia & Edoardo Petrucco Toffolo & Andrea Battisti & Lorenzo Marini, 2016. "Bark and Ambrosia Beetles Show Different Invasion Patterns in the USA," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-17, July.
    18. Michael F Bonner & Russell A Epstein, 2018. "Computational mechanisms underlying cortical responses to the affordance properties of visual scenes," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-31, April.
    19. Arias-Castro, Ery & Pu, Xiao, 2017. "A simple approach to sparse clustering," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 217-228.
    20. Sung-Soo Kim & W. Krzanowski, 2007. "Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization," Computational Statistics, Springer, vol. 22(1), pages 109-119, April.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:99:y:2016:i:c:p:171-188. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.