IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v108y2017icp52-69.html
   My bibliography  Save this article

Tracking concept drift using a constrained penalized regression combiner

Author

Listed:
  • Wang, Li-Yu
  • Park, Cheolwoo
  • Yeon, Kyupil
  • Choi, Hosik

Abstract

The objective of this work is to develop a predictive model when data batches are collected in a sequential manner. With streaming data, information is constantly being updated and a major statistical challenge for these types of data is that the underlying distribution and the true input–output dependency might change over time, a phenomenon known as concept drift. The concept drift phenomenon makes the learning process complicated because a predictive model constructed on the past data is no longer consistent with new examples. In order to effectively track concept drift, we propose model-combining methods using constrained and penalized regression that possesses a grouping property. The new learning methods enable us to select data batches as a group that are relevant to the current one, reduce the effects of irrelevant batches, and adaptively reflect the degree of concept drift emerging in data streams. We demonstrate the finite sample performance of the proposed method using simulated and real examples. The analytical and empirical results indicate that the proposed methods can effectively adapt to various types of concept drift.

Suggested Citation

  • Wang, Li-Yu & Park, Cheolwoo & Yeon, Kyupil & Choi, Hosik, 2017. "Tracking concept drift using a constrained penalized regression combiner," Computational Statistics & Data Analysis, Elsevier, vol. 108(C), pages 52-69.
  • Handle: RePEc:eee:csdana:v:108:y:2017:i:c:p:52-69
    DOI: 10.1016/j.csda.2016.11.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947316302663
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2016.11.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Shen, Xiaotong & Huang, Hsin-Cheng, 2010. "Grouping Pursuit Through a Regularization Solution Surface," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 727-739.
    2. Robert Tibshirani & Michael Saunders & Saharon Rosset & Ji Zhu & Keith Knight, 2005. "Sparsity and smoothness via the fused lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(1), pages 91-108, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2019. "Online estimation of individual-level effects using streaming shrinkage factors," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 16-32.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jeon, Jong-June & Kwon, Sunghoon & Choi, Hosik, 2017. "Homogeneity detection for the high-dimensional generalized linear model," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 61-74.
    2. Peter Radchenko & Gourab Mukherjee, 2017. "Convex clustering via l 1 fusion penalization," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(5), pages 1527-1546, November.
    3. Zheng Tracy Ke & Jianqing Fan & Yichao Wu, 2015. "Homogeneity Pursuit," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 175-194, March.
    4. Liu, Lili & Lin, Lu, 2019. "Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data," Computational Statistics & Data Analysis, Elsevier, vol. 138(C), pages 239-259.
    5. Yuan Yan & Hsin-Cheng Huang & Marc G. Genton, 2021. "Vector Autoregressive Models with Spatially Structured Coefficients for Time Series on a Spatial Grid," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(3), pages 387-408, September.
    6. Sunkyung Kim & Wei Pan & Xiaotong Shen, 2013. "Network-Based Penalized Regression With Application to Genomic Data," Biometrics, The International Biometric Society, vol. 69(3), pages 582-593, September.
    7. Lu Tang & Peter X.‐K. Song, 2021. "Poststratification fusion learning in longitudinal data analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 914-928, September.
    8. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    9. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    10. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    11. Francis X. Diebold & Kamil Yilmaz, 2016. "Trans-Atlantic Equity Volatility Connectedness: U.S. and European Financial Institutions, 2004–2014," Journal of Financial Econometrics, Oxford University Press, vol. 14(1), pages 81-127.
    12. Jian Guo & Elizaveta Levina & George Michailidis & Ji Zhu, 2010. "Pairwise Variable Selection for High-Dimensional Model-Based Clustering," Biometrics, The International Biometric Society, vol. 66(3), pages 793-804, September.
    13. Franck Rapaport & Christina Leslie, 2010. "Determining Frequent Patterns of Copy Number Alterations in Cancer," PLOS ONE, Public Library of Science, vol. 5(8), pages 1-10, August.
    14. Lu Tang & Ling Zhou & Peter X. K. Song, 2019. "Fusion learning algorithm to combine partially heterogeneous Cox models," Computational Statistics, Springer, vol. 34(1), pages 395-414, March.
    15. Young‐Geun Choi & Lawrence P. Hanrahan & Derek Norton & Ying‐Qi Zhao, 2022. "Simultaneous spatial smoothing and outlier detection using penalized regression, with application to childhood obesity surveillance from electronic health records," Biometrics, The International Biometric Society, vol. 78(1), pages 324-336, March.
    16. Molly C. Klanderman & Kathryn B. Newhart & Tzahi Y. Cath & Amanda S. Hering, 2020. "Fault isolation for a complex decentralized waste water treatment facility," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 931-951, August.
    17. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    18. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    19. Zhong, Yan & Sang, Huiyan & Cook, Scott J. & Kellstedt, Paul M., 2023. "Sparse spatially clustered coefficient model via adaptive regularization," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
    20. Murat Genç & M. Revan Özkale, 2021. "Usage of the GO estimator in high dimensional linear models," Computational Statistics, Springer, vol. 36(1), pages 217-239, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:108:y:2017:i:c:p:52-69. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.