IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v8y2020i2p169-d314961.html
   My bibliography  Save this article

Prediction of Extracellular Matrix Proteins by Fusing Multiple Feature Information, Elastic Net, and Random Forest Algorithm

Author

Listed:
  • Minghui Wang

    (College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
    Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China)

  • Lingling Yue

    (College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
    Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China)

  • Xiaowen Cui

    (College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
    Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China)

  • Cheng Chen

    (College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
    Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China)

  • Hongyan Zhou

    (College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
    Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China)

  • Qin Ma

    (Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA)

  • Bin Yu

    (College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
    Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
    School of Life Sciences, University of Science and Technology of China, Hefei 230027, China)

Abstract

Extracellular matrix (ECM) proteins play an important role in a series of biological processes of cells. The study of ECM proteins is helpful to further comprehend their biological functions. We propose ECMP-RF (extracellular matrix proteins prediction by random forest) to predict ECM proteins. Firstly, the features of the protein sequence are extracted by combining encoding based on grouped weight, pseudo amino-acid composition, pseudo position-specific scoring matrix, a local descriptor, and an autocorrelation descriptor. Secondly, the synthetic minority oversampling technique (SMOTE) algorithm is employed to process the class imbalance data, and the elastic net (EN) is used to reduce the dimension of the feature vectors. Finally, the random forest (RF) classifier is used to predict the ECM proteins. Leave-one-out cross-validation shows that the balanced accuracy of the training and testing datasets is 97.3% and 97.9%, respectively. Compared with other state-of-the-art methods, ECMP-RF is significantly better than other predictors.

Suggested Citation

  • Minghui Wang & Lingling Yue & Xiaowen Cui & Cheng Chen & Hongyan Zhou & Qin Ma & Bin Yu, 2020. "Prediction of Extracellular Matrix Proteins by Fusing Multiple Feature Information, Elastic Net, and Random Forest Algorithm," Mathematics, MDPI, vol. 8(2), pages 1-18, January.
  • Handle: RePEc:gam:jmathe:v:8:y:2020:i:2:p:169-:d:314961
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/8/2/169/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/8/2/169/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    3. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Margherita Giuzio, 2017. "Genetic algorithm versus classical methods in sparse index tracking," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 40(1), pages 243-256, November.
    3. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    4. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    5. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    6. Norman R. Swanson & Weiqi Xiong, 2018. "Big data analytics in economics: What have we learned so far, and where should we go from here?," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 51(3), pages 695-746, August.
    7. Gareth M. James & Peter Radchenko & Jinchi Lv, 2009. "DASSO: connections between the Dantzig selector and lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(1), pages 127-142, January.
    8. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    9. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    10. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    11. Wang Zhu & Wang C.Y., 2010. "Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    12. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    13. repec:jss:jstsof:33:i01 is not listed on IDEAS
    14. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    15. Bartosz Uniejewski, 2024. "Regularization for electricity price forecasting," Papers 2404.03968, arXiv.org.
    16. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    17. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    18. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    19. Yu-Min Yen, 2010. "A Note on Sparse Minimum Variance Portfolios and Coordinate-Wise Descent Algorithms," Papers 1005.5082, arXiv.org, revised Sep 2013.
    20. Costa, Alexandre Bonnet R. & Ferreira, Pedro Cavalcanti G. & Gaglianone, Wagner P. & Guillén, Osmani Teixeira C. & Issler, João Victor & Lin, Yihao, 2021. "Machine learning and oil price point and density forecasting," Energy Economics, Elsevier, vol. 102(C).
    21. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:8:y:2020:i:2:p:169-:d:314961. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.