IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0279174.html

Personalized breast cancer onset prediction from lifestyle and health history information

Author

Listed:
  • Shi-ang Qi
  • Neeraj Kumar
  • Jian-Yi Xu
  • Jaykumar Patel
  • Sambasivarao Damaraju
  • Grace Shen-Tu
  • Russell Greiner

Abstract

We propose a method to predict when a woman will develop breast cancer (BCa) from her lifestyle and health history features. To address this objective, we use data from the Alberta’s Tomorrow Project of 18,288 women to train Individual Survival Distribution (ISD) models to predict an individual’s Breast-Cancer-Onset (BCaO) probability curve. We show that our three-step approach–(1) filling missing data with multiple imputations by chained equations, followed by (2) feature selection with the multivariate Cox method, and finally, (3) using MTLR to learn an ISD model–produced the model with the smallest L1-Hinge loss among all calibrated models with comparable C-index. We also identified 7 actionable lifestyle features that a woman can modify and illustrate how this model can predict the quantitative effects of those changes–suggesting how much each will potentially extend her BCa-free time. We anticipate this approach could be used to identify appropriate interventions for individuals with a higher likelihood of developing BCa in their lifetime.

Suggested Citation

  • Shi-ang Qi & Neeraj Kumar & Jian-Yi Xu & Jaykumar Patel & Sambasivarao Damaraju & Grace Shen-Tu & Russell Greiner, 2022. "Personalized breast cancer onset prediction from lifestyle and health history information," PLOS ONE, Public Library of Science, vol. 17(12), pages 1-23, December.
  • Handle: RePEc:plo:pone00:0279174
    DOI: 10.1371/journal.pone.0279174
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0279174
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0279174&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0279174?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stute, W., 1993. "Consistent Estimation Under Random Censorship When Covariables Are Present," Journal of Multivariate Analysis, Elsevier, vol. 45(1), pages 89-103, April.
    2. Simon, Noah & Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2011. "Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 39(i05).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qin Yu & Xin Zhou & Jia Zhou & Zemin Zheng, 2026. "Confidence intervals for high-dimensional accelerated failure time models under measurement errors," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 32(1), pages 1-18, March.
    2. Soave, David & Lawless, Jerald F., 2023. "Regularized regression for two phase failure time studies," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    3. Liang, Weijuan & Zhang, Qingzhao & Ma, Shuangge, 2024. "Hierarchical false discovery rate control for high-dimensional survival analysis with interactions," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    4. Hua Xin & Yuhlong Lio & Hsien-Ching Chen & Tzong-Ru Tsai, 2024. "Zero-Inflated Binary Classification Model with Elastic Net Regularization," Mathematics, MDPI, vol. 12(19), pages 1-17, September.
    5. Zhiping Qiu & Jing Qin & Yong Zhou, 2016. "Composite Estimating Equation Method for the Accelerated Failure Time Model with Length-biased Sampling Data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(2), pages 396-415, June.
    6. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    7. Wang Zhu & Wang C.Y., 2010. "Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    8. Simon Bussy & Mokhtar Z. Alaya & Anne‐Sophie Jannot & Agathe Guilloux, 2022. "Binacox: automatic cut‐point detection in high‐dimensional Cox model with applications in genetics," Biometrics, The International Biometric Society, vol. 78(4), pages 1414-1426, December.
    9. Biagini, Francesca & Groll, Andreas & Widenmann, Jan, 2013. "Intensity-based premium evaluation for unemployment insurance products," Insurance: Mathematics and Economics, Elsevier, vol. 53(1), pages 302-316.
    10. Benedicte Sjo Tislevoll & Monica Hellesøy & Oda Helen Eck Fagerholt & Stein-Erik Gullaksen & Aashish Srivastava & Even Birkeland & Dimitrios Kleftogiannis & Pilar Ayuda-Durán & Laure Piechaczyk & Dagi, 2023. "Early response evaluation by single cell signaling profiling in acute myeloid leukemia," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    11. Paul Janssen & Noël Veraverbeke, 2024. "Nonparametric estimation of univariate and bivariate survival functions under right censoring: a survey," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 87(3), pages 211-245, April.
    12. Gabriela Ciuperca, 2025. "Right-censored models by the expectile method," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 31(1), pages 149-186, January.
    13. Matthew F Dixon, 2017. "A High Frequency Trade Execution Model for Supervised Learning," Papers 1710.03870, arXiv.org, revised Dec 2017.
    14. Ruoqing Zhu & Ying-Qi Zhao & Guanhua Chen & Shuangge Ma & Hongyu Zhao, 2017. "Greedy outcome weighted tree learning of optimal personalized treatment rules," Biometrics, The International Biometric Society, vol. 73(2), pages 391-400, June.
    15. Simmons, Sally Sonia, 2023. "Strikes and gutters: biomarkers and anthropometric measures for predicting diagnosed diabetes mellitus in adults in low- and middle-income countries," LSE Research Online Documents on Economics 120395, London School of Economics and Political Science, LSE Library.
    16. Leandro C. Hermida & E. Michael Gertz & Eytan Ruppin, 2022. "RETRACTED ARTICLE: Predicting cancer prognosis and drug response from the tumor microbiome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    17. Guessoum Zohra & Ould-Said Elias, 2009. "On nonparametric estimation of the regression function under random censorship model," Statistics & Risk Modeling, De Gruyter, vol. 26(3), pages 159-177, April.
    18. Weiyu Li & Valentin Patilea, 2018. "A dimension reduction approach for conditional Kaplan–Meier estimators," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(2), pages 295-315, June.
    19. Sungwan Bang & Soo-Heang Eo & Yong Mee Cho & Myoungshic Jhun & HyungJun Cho, 2016. "Non-crossing weighted kernel quantile regression with right censored data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 22(1), pages 100-121, January.
    20. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2015. "Tree-based censored regression with applications to insurance," Working Papers hal-01141228, HAL.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0279174. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.