IDEAS home Printed from https://ideas.repec.org/a/spr/lifeda/v28y2022i3d10.1007_s10985-022-09557-5.html
   My bibliography  Save this article

Semi-supervised approach to event time annotation using longitudinal electronic health records

Author

Listed:
  • Liang Liang

    (Harvard T. H. Chan School of Public Health)

  • Jue Hou

    (Harvard T. H. Chan School of Public Health)

  • Hajime Uno

    (Dana-Farber Cancer Institute)

  • Kelly Cho

    (Massachusetts Veterans Epidemiology Research and Information Center, US Department of Veteran Affairs
    Brigham and Women’s Hospital, Harvard Medical School)

  • Yanyuan Ma

    (Penn State University)

  • Tianxi Cai

    (Harvard T. H. Chan School of Public Health
    Harvard Medical School)

Abstract

Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.

Suggested Citation

  • Liang Liang & Jue Hou & Hajime Uno & Kelly Cho & Yanyuan Ma & Tianxi Cai, 2022. "Semi-supervised approach to event time annotation using longitudinal electronic health records," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(3), pages 428-491, July.
  • Handle: RePEc:spr:lifeda:v:28:y:2022:i:3:d:10.1007_s10985-022-09557-5
    DOI: 10.1007/s10985-022-09557-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10985-022-09557-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10985-022-09557-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wang, Hansheng & Leng, Chenlei, 2008. "A note on adaptive group lasso," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5277-5286, August.
    2. Ying Zhang & Lei Hua & Jian Huang, 2010. "A Spline‐Based Semiparametric Maximum Likelihood Estimation Method for the Cox Model with Interval‐Censored Data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 37(2), pages 338-354, June.
    3. Yao, Fang & Muller, Hans-Georg & Wang, Jane-Ling, 2005. "Functional Data Analysis for Sparse Longitudinal Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 577-590, June.
    4. Zeng, Donglin & Lin, D.Y. & Yin, Guosheng, 2005. "Maximum Likelihood Estimation for the Proportional Odds Model With Random Effects," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 470-483, June.
    5. Wang, Hansheng & Leng, Chenlei, 2007. "Unified LASSO Estimation by Least Squares Approximation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1039-1048, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    2. Yuanyuan Shen & Katherine P. Liao & Tianxi Cai, 2015. "Sparse kernel machine regression for ordinal outcomes," Biometrics, The International Biometric Society, vol. 71(1), pages 63-70, March.
    3. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    4. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    5. Chenlei Leng & Minh-Ngoc Tran & David Nott, 2014. "Bayesian adaptive Lasso," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(2), pages 221-244, April.
    6. Zhixuan Fu & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized variable selection in competing risks regression," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 353-376, July.
    7. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    8. Ana-Maria Staicu & Yingxing Li & Ciprian M. Crainiceanu & David Ruppert, 2014. "Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(4), pages 932-949, December.
    9. Şentürk, Damla & Ghosh, Samiran & Nguyen, Danh V., 2014. "Exploratory time varying lagged regression: Modeling association of cognitive and functional trajectories with expected clinic visits in older adults," Computational Statistics & Data Analysis, Elsevier, vol. 73(C), pages 1-15.
    10. Wang, Jingxing & Chung, Seokhyun & AlShelahi, Abdullah & Kontar, Raed & Byon, Eunshin & Saigal, Romesh, 2021. "Look-ahead decision making for renewable energy: A dynamic “predict and store” approach," Applied Energy, Elsevier, vol. 296(C).
    11. Heredia, María Belén & Prieur, Clémentine & Eckert, Nicolas, 2022. "Global sensitivity analysis with aggregated Shapley effects, application to avalanche hazard assessment," Reliability Engineering and System Safety, Elsevier, vol. 222(C).
    12. Qingning Zhou & Jianwen Cai & Haibo Zhou, 2018. "Outcome†dependent sampling with interval†censored failure time data," Biometrics, The International Biometric Society, vol. 74(1), pages 58-67, March.
    13. Dong, C. & Li, S., 2021. "Specification Lasso and an Application in Financial Markets," Cambridge Working Papers in Economics 2139, Faculty of Economics, University of Cambridge.
    14. Febrero-Bande, Manuel & González-Manteiga, Wenceslao & Prallon, Brenda & Saporito, Yuri F., 2023. "Functional classification of bitcoin addresses," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    15. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    16. Xiuli Du & Xiaohu Jiang & Jinguan Lin, 2023. "Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 975-1001, September.
    17. Ye, Mao & Lu, Zhao-Hua & Li, Yimei & Song, Xinyuan, 2019. "Finite mixture of varying coefficient model: Estimation and component selection," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 452-474.
    18. Guangxing Wang & Sisheng Liu & Fang Han & Chong‐Zhi Di, 2023. "Robust functional principal component analysis via a functional pairwise spatial sign operator," Biometrics, The International Biometric Society, vol. 79(2), pages 1239-1253, June.
    19. Jiménez Recaredo, Raúl José & Elías Fernández, Antonio, 2017. "Prediction Bands for Functional Data Based on Depth Measures," DES - Working Papers. Statistics and Econometrics. WS 24606, Universidad Carlos III de Madrid. Departamento de Estadística.
    20. Lenka Zbonakova & Wolfgang Karl Härdle & Weining Wang, 2016. "Time Varying Quantile Lasso," SFB 649 Discussion Papers SFB649DP2016-047, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:lifeda:v:28:y:2022:i:3:d:10.1007_s10985-022-09557-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.