Analysis of presence-only data via semi-supervised learning approaches
Presence-only data occur in a classification, which consist of a sample of observations from the presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semi-supervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to the presence class. In this article, we propose a generalized class balance constraint, which can be equipped with semi-supervised learning approaches to prevent them from degeneration. Furthermore, to circumvent the difficulty of model tuning with presence-only data, a selection criterion based on classification stability is developed, which measures the robustness of any given classification algorithm against the sampling randomness. The effectiveness of the proposed approach is demonstrated through a variety of simulated examples, along with an application to gene function prediction.
If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.
Volume (Year): 59 (2013)
Issue (Month): C ()
|Contact details of provider:|| Web page: http://www.elsevier.com/locate/csda|
References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473.
- Gill Ward & Trevor Hastie & Simon Barry & Jane Elith & John R. Leathwick, 2009. "Presence-Only Data and the EM Algorithm," Biometrics, The International Biometric Society, vol. 65(2), pages 554-563, 06.
- Junhui Wang, 2010. "Consistent selection of the number of clusters via crossvalidation," Biometrika, Biometrika Trust, vol. 97(4), pages 893-904.
- Junhui Wang & Xiaotong Shen & Yufeng Liu, 2008. "Probability estimation for large-margin classifiers," Biometrika, Biometrika Trust, vol. 95(1), pages 149-167.
When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:59:y:2013:i:c:p:134-143. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Shamier, Wendy)
If references are entirely missing, you can add them using this form.