IDEAS home Printed from https://ideas.repec.org/a/eee/ecosta/v26y2023icp124-138.html

Semi-Supervised Learning of Classifiers from a Statistical Perspective: A Brief Review

Author

Listed:
  • Ahfock, Daniel
  • McLachlan, Geoffrey J.

Abstract

There has been increasing attention to semi-supervised learning (SSL) approaches in machine learning to forming a classifier in situations where the training data for a classifier consists of a limited number of classified observations but a much larger number of unclassified observations. This is because the procurement of classified data can be quite costly due to high acquisition costs and subsequent financial, time, and ethical issues that can arise in attempts to provide the true class labels for the unclassified data that have been acquired. A review is provided of statistical SSL approaches to this problem, focussing on the recent result that a classifier formed from a partially classified sample can actually have smaller expected error rate than that if the sample were completely classified. This rather paradoxical outcome is able to be achieved by introducing a framework with a missingness mechanism for the missing labels of the unclassified observations. It is most relevant in commonly occurring situations in practice, where the unclassified data occur primarily in regions of relatively high entropy in the feature space thereby making it difficult for their class labels to be easily obtained.

Suggested Citation

  • Ahfock, Daniel & McLachlan, Geoffrey J., 2023. "Semi-Supervised Learning of Classifiers from a Statistical Perspective: A Brief Review," Econometrics and Statistics, Elsevier, vol. 26(C), pages 124-138.
  • Handle: RePEc:eee:ecosta:v:26:y:2023:i:c:p:124-138
    DOI: 10.1016/j.ecosta.2022.03.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S2452306222000296
    Download Restriction: Full text for ScienceDirect subscribers only. Contains open access articles

    File URL: https://libkey.io/10.1016/j.ecosta.2022.03.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Irene Vrbik & Paul McNicholas, 2015. "Fractionally-Supervised Classification," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 359-381, October.
    2. Fabrizia Mealli & Donald B. Rubin, 2016. "‘Clarifying missing at random and related definitions, and implications when coupled with exchangeability’," Biometrika, Biometrika Trust, vol. 103(2), pages 491-491.
    3. Ofer Harel & Joseph L. Schafer, 2009. "Partial and latent ignorability in missing-data problems," Biometrika, Biometrika Trust, vol. 96(1), pages 37-50.
    4. Michael P. B. Gallaugher & Paul D. McNicholas, 2019. "On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-Distribution," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 232-265, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Utkarsh J. Dang & Michael P.B. Gallaugher & Ryan P. Browne & Paul D. McNicholas, 2023. "Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 145-167, April.
    2. Sharon M. McNicholas & Paul D. McNicholas & Daniel A. Ashlock, 2021. "An Evolutionary Algorithm with Crossover and Mutation for Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 264-279, July.
    3. Vernon T. Farewell & Li Su & Christopher Jackson, 2019. "Partially hidden multi-state modelling of a prolonged disease state defined by a composite outcome," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(4), pages 696-711, October.
    4. Paula M. Murray & Ryan P. Browne & Paul D. McNicholas, 2020. "Mixtures of Hidden Truncation Hyperbolic Factor Analyzers," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 366-379, July.
    5. repec:osf:osfxxx:f9jvz_v1 is not listed on IDEAS
    6. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    7. Michael P. B. Gallaugher & Paul D. McNicholas, 2019. "On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-Distribution," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 232-265, July.
    8. Fei Wang & Yuhao Deng, 2023. "Non-Asymptotic Bounds of AIPW Estimators for Means with Missingness at Random," Mathematics, MDPI, vol. 11(4), pages 1-14, February.
    9. Ruizhe Chen & Yu-Che Chung & Sanjib Basu & Qian Shi, 2024. "Diagnostic Test for Realized Missingness in Mixed-type Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(1), pages 109-138, May.
    10. Robitzsch, Alexander, 2020. "About Still Nonignorable Consequences of (Partially) Ignoring Missing Item Responses in Large-scale Assessment," OSF Preprints hmy45, Center for Open Science.
    11. Ali, Saif & Arora, Gaurav, 2021. "Well-level Missingness Mechanisms in Administrative Groundwater Monitoring Data for Uttar Pradesh (UP), India, 2009-2018," 2021 Annual Meeting, August 1-3, Austin, Texas 314038, Agricultural and Applied Economics Association.
    12. Jouni Kuha & Myrsini Katsikatsou & Irini Moustaki, 2018. "Latent variable modelling with non‐ignorable item non‐response: multigroup response propensity models for cross‐national analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1169-1192, October.
    13. Francesco Bartolucci & Giorgio E. Montanari & Silvia Pandolfi, 2018. "Latent Ignorability and Item Selection for Nursing Home Case-Mix Evaluation," Journal of Classification, Springer;The Classification Society, vol. 35(1), pages 172-193, April.
    14. Francesco Bartolucci & Giorgio E. Montanari & Silvia Pandolfi, 2016. "Item selection by latent class-based methods: an application to nursing home evaluation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(2), pages 245-262, June.
    15. Chenguang Wang & Michael J. Daniels, 2011. "A Note on MAR, Identifying Restrictions, Model Comparison, and Sensitivity Analysis in Pattern Mixture Models with and without Covariates for Incomplete Data," Biometrics, The International Biometric Society, vol. 67(3), pages 810-818, September.
    16. A. R. Linero, 2017. "Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness," Biometrika, Biometrika Trust, vol. 104(2), pages 327-341.
    17. Bartolucci, Francesco & Giorgio E., Montanari & Pandolfi, Silvia, 2012. "Item selection by an extended Latent Class model: An application to nursing homes evaluation," MPRA Paper 38757, University Library of Munich, Germany.
    18. repec:osf:osfxxx:hmy45_v1 is not listed on IDEAS
    19. Antonio R. Linero & Michael J. Daniels, 2015. "A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies With Nonignorable Missingness With Application to an Acute Schizophrenia Clinical Trial," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 45-55, March.
    20. Marco Doretti & Sara Geneletti & Elena Stanghellini, 2018. "Missing Data: A Unified Taxonomy Guided by Conditional Independence," International Statistical Review, International Statistical Institute, vol. 86(2), pages 189-204, August.
    21. Douglas L. Steinley, 2019. "Editorial: Journal of Classification Vol. 36–2," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 175-176, July.
    22. Douglas L. Steinley, 2016. "Editorial," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 327-330, October.

    More about this item

    Keywords

    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecosta:v:26:y:2023:i:c:p:124-138. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/econometrics-and-statistics .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.