Analysis of presence-only data via semi-supervised learning approaches
Presence-only data occur in a classification, which consist of a sample of observations from the presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semi-supervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to the presence class. In this article, we propose a generalized class balance constraint, which can be equipped with semi-supervised learning approaches to prevent them from degeneration. Furthermore, to circumvent the difficulty of model tuning with presence-only data, a selection criterion based on classification stability is developed, which measures the robustness of any given classification algorithm against the sampling randomness. The effectiveness of the proposed approach is demonstrated through a variety of simulated examples, along with an application to gene function prediction.
Volume (Year): 59 (2013)
Issue (Month): C ()
|Contact details of provider:|| Web page: http://www.elsevier.com/locate/csda|
References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Gill Ward & Trevor Hastie & Simon Barry & Jane Elith & John R. Leathwick, 2009. "Presence-Only Data and the EM Algorithm," Biometrics, The International Biometric Society, vol. 65(2), pages 554-563, 06.
- Junhui Wang, 2010. "Consistent selection of the number of clusters via crossvalidation," Biometrika, Biometrika Trust, vol. 97(4), pages 893-904.
- Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473.
- Junhui Wang & Xiaotong Shen & Yufeng Liu, 2008. "Probability estimation for large-margin classifiers," Biometrika, Biometrika Trust, vol. 95(1), pages 149-167.
When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:59:y:2013:i:c:p:134-143. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Shamier, Wendy)
If references are entirely missing, you can add them using this form.