IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i1p935-943.html

A statistical approach to high-throughput screening of predicted orthologs

Author

Listed:
  • Min, Jeong Eun
  • Whiteside, Matthew D.
  • Brinkman, Fiona S.L.
  • McNeney, Brad
  • Graham, Jinko

Abstract

Orthologs are genes in different species that have diverged from a common ancestral gene after speciation. In contrast, paralogs are genes that have diverged after a gene duplication event. For many comparative analyses, it is of interest to identify orthologs with similar functions. Such orthologs tend to support species divergence (ssd-orthologs) in the sense that they have diverged only due to speciation, to the same relative degree as their species. However, due to incomplete sequencing or gene loss in a species, predicted orthologs can sometimes be paralogs or other non-ssd-orthologs. To increase the specificity of ssd-ortholog prediction, Fulton et al. [Fulton, D., Li, Y., Laird, M., Horsman, B., Roche, F., Brinkman, F., 2006. Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics 7 (1), 270] developed Ortholuge, a bioinformatics tool that identifies predicted orthologs with atypical genetic divergence. However, when the initial list of putative orthologs contains a non-negligible number of non-ssd-orthologs, the cut-off values that Ortholuge generates for orthology classification are difficult to interpret and can be too high, leading to decreased specificity of ssd-ortholog prediction. Therefore, we propose a complementary statistical approach to determining cut-off values. A benefit of the proposed approach is that it gives the user an estimated conditional probability that a predicted ortholog pair is unusually diverged. This enables the interpretation and selection of cut-off values based on a direct measure of the relative composition of ssd-orthologs versus non-ssd-orthologs. In a simulation comparison of the two approaches, we find that the statistical approach provides more stable cut-off values and improves the specificity of ssd-ortholog prediction for low-quality data sets of predicted orthologs.

Suggested Citation

  • Min, Jeong Eun & Whiteside, Matthew D. & Brinkman, Fiona S.L. & McNeney, Brad & Graham, Jinko, 2011. "A statistical approach to high-throughput screening of predicted orthologs," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 935-943, January.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:1:p:935-943
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00317-8
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pounds Stanley B. & Gao Cuilan L. & Zhang Hui, 2012. "Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-32, October.
    2. Hai Shu & Bin Nan & Robert Koeppe, 2015. "Multiple testing for neuroimaging via hidden Markov random field," Biometrics, The International Biometric Society, vol. 71(3), pages 741-750, September.
    3. Yong Wang, 2009. "The constrained Fisher scoring method for maximum likelihood computation of a nonparametric mixing distribution," Computational Statistics, Springer, vol. 24(1), pages 67-81, February.
    4. Bilgrau, Anders Ellern & Eriksen, Poul Svante & Rasmussen, Jakob Gulddahl & Johnsen, Hans Erik & Dybkaer, Karen & Boegsted, Martin, 2016. "GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i02).
    5. Gary L Gadbury & Qinfang Xiang & Lin Yang & Stephen Barnes & Grier P Page & David B Allison, 2008. "Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates," PLOS Genetics, Public Library of Science, vol. 4(6), pages 1-8, June.
    6. Campbell R. Harvey & Yan Liu & Heqing Zhu, 2014. ". . . and the Cross-Section of Expected Returns," NBER Working Papers 20592, National Bureau of Economic Research, Inc.
    7. Shigeyuki Matsui & Hisashi Noma, 2011. "Estimating Effect Sizes of Differentially Expressed Genes for Power and Sample-Size Assessments in Microarray Experiments," Biometrics, The International Biometric Society, vol. 67(4), pages 1225-1235, December.
    8. Patrick Kline & Christopher Walters, 2019. "Audits as Evidence: Experiments, Ensembles, and Enforcement," Papers 1907.06622, arXiv.org, revised Jul 2019.
    9. Raphael Gottardo & Wei Li & W. Evan Johnson & X. Shirley Liu, 2008. "A Flexible and Powerful Bayesian Hierarchical Model for ChIP–Chip Experiments," Biometrics, The International Biometric Society, vol. 64(2), pages 468-478, June.
    10. Sairam Rayaprolu & Zhiyi Chi, 2021. "False Discovery Variance Reduction in Large Scale Simultaneous Hypothesis Tests," Methodology and Computing in Applied Probability, Springer, vol. 23(3), pages 711-733, September.
    11. Won, Joong-Ho & Lim, Johan & Yu, Donghyeon & Kim, Byung Soo & Kim, Kyunga, 2014. "Monotone false discovery rate," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 86-93.
    12. David R. Bickel, 2014. "Small-scale Inference: Empirical Bayes and Confidence Methods for as Few as a Single Comparison," International Statistical Review, International Statistical Institute, vol. 82(3), pages 457-476, December.
    13. Pan, Lanfeng & Li, Yehua & He, Kevin & Li, Yanming & Li, Yi, 2020. "Generalized linear mixed models with Gaussian mixture random effects: Inference and application," Journal of Multivariate Analysis, Elsevier, vol. 175(C).
    14. van Wieringen, Wessel N. & Stam, Koen A. & Peeters, Carel F.W. & van de Wiel, Mark A., 2020. "Updating of the Gaussian graphical model through targeted penalized estimation," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    15. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    16. Hefei Zhang & Xuhang Li & Dongyuan Song & Onur Yukselen & Shivani Nanda & Alper Kucukural & Jingyi Jessica Li & Manuel Garber & Albertha J. M. Walhout, 2025. "Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq," Nature Communications, Nature, vol. 16(1), pages 1-21, December.
    17. Angela Schörgendorfer & Adam J. Branscum & Timothy E. Hanson, 2013. "A Bayesian Goodness of Fit Test and Semiparametric Generalization of Logistic Regression with Measurement Data," Biometrics, The International Biometric Society, vol. 69(2), pages 508-519, June.
    18. Zhao, Haibing & Fung, Wing Kam, 2016. "A powerful FDR control procedure for multiple hypotheses," Computational Statistics & Data Analysis, Elsevier, vol. 98(C), pages 60-70.
    19. T. Tony Cai & Wenguang Sun & Weinan Wang, 2019. "Covariate‐assisted ranking and screening for large‐scale two‐sample inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 187-234, April.
    20. Hong, Zhaoping & Lian, Heng, 2012. "BOPA: A Bayesian hierarchical model for outlier expression detection," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 4146-4156.

    More about this item

    Keywords

    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:1:p:935-943. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.