IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v243y2015i1p177-189.html
   My bibliography  Save this article

Adapting a classification rule to local and global shift when only unlabelled data are available

Author

Listed:
  • Hofer, Vera

Abstract

For evolving populations the training data and the test data need not follow the same distribution. Thus, the performance of a prediction model will deteriorate over the course of time. This requires the re-estimation of the prediction model after some time. However, in many applications e.g. credit scoring, new labelled data are not available for re-estimation due to verification latency, i.e. label delay. Thus, methods which enable a prediction model to adapt to distributional changes by using only unlabelled data are highly desirable. A shift adaptation method for binary classification is presented here. The model is based on mixture distributions. The conditional feature distributions are determined at the time where labelled data are available, and the unconditional feature distribution is determined at the time where new unlabelled data are accessible. These mixture distributions provide information on the old and the new positions of subpopulations. A transition model then describes how the subpopulations of each class have drifted to form the new unconditional feature distribution. Assuming that the conditional distributions are reorganised using a minimum of energy, a two-step estimation procedure results. First, for a given class prior distribution the transfer of probability mass is estimated such that the energy required to obtain the new unconditional distribution by a local transfer of the old conditional distributions is a minimum. Since the optimal solution of the resulting transportation problem measures the distance between the old and the new distributions, the change of the class prior distribution is found in a second step by solving the transportation problem for varying class prior distributions and selecting the value for which the objective function is a minimum. Using the solution of the transportation problem and the component parameters of the unconditional feature distribution, the new conditional feature distribution can be determined. This thus allows for a shift adaptation of the classification rule. The performance of the proposed model is investigated using a large real-world dataset on default rates in Danish companies. The results show that the shift adaptation improves classification results.

Suggested Citation

  • Hofer, Vera, 2015. "Adapting a classification rule to local and global shift when only unlabelled data are available," European Journal of Operational Research, Elsevier, vol. 243(1), pages 177-189.
  • Handle: RePEc:eee:ejores:v:243:y:2015:i:1:p:177-189
    DOI: 10.1016/j.ejor.2014.11.022
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S037722171400945X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2014.11.022?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Christophe Biernacki & Farid Beninel & Vincent Bretagnolle, 2002. "A Generalized Discriminant Rule When Training Population and Test Population Differ on Their Descriptive Parameters," Biometrics, The International Biometric Society, vol. 58(2), pages 387-397, June.
    2. P. Scobey & D. G. Kabe, 1981. "Direct Solutions to Some Multidimensional Transportation Problems," Transportation Science, INFORMS, vol. 15(1), pages 1-15, February.
    3. Masashi Sugiyama & Taiji Suzuki & Shinichi Nakajima & Hisashi Kashima & Paul Bünau & Motoaki Kawanabe, 2008. "Direct importance estimation for covariate shift adaptation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 60(4), pages 699-746, December.
    4. Hand D.J. & Vinciotti V., 2003. "Local Versus Global Models for Classification Problems: Fitting Models Where it Matters," The American Statistician, American Statistical Association, vol. 57, pages 124-131, May.
    5. Hofer, Vera & Krempl, Georg, 2013. "Drift mining in data: A framework for addressing drift in classification," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 377-391.
    6. Yang, Yingxu, 2007. "Adaptive credit scoring with kernel learning methods," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1521-1536, December.
    7. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    2. Dias, Sónia & Brito, Paula, 2017. "Off the beaten track: A new linear model for interval data," European Journal of Operational Research, Elsevier, vol. 258(3), pages 1118-1130.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guotai Chi & Zhipeng Zhang, 2017. "Multi Criteria Credit Rating Model for Small Enterprise Using a Nonparametric Method," Sustainability, MDPI, vol. 9(10), pages 1-23, October.
    2. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    3. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    4. Huseyin Ince & Bora Aktan, 2009. "A comparison of data mining techniques for credit scoring in banking: A managerial perspective," Journal of Business Economics and Management, Taylor & Francis Journals, vol. 10(3), pages 233-240, March.
    5. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    6. Maria Rocha Sousa & João Gama & Elísio Brandão, 2013. "Introducing time-changing economics into credit scoring," FEP Working Papers 513, Universidade do Porto, Faculdade de Economia do Porto.
    7. Raffaella Calabrese, 2012. "Improving Classifier Performance Assessment of Credit Scoring Models," Working Papers 201204, Geary Institute, University College Dublin.
    8. Barbara CAVALLETTI & Corrado LAGAZIO & Daniela VANDONE, 2008. "Il credito al consumo in Italia: benessere economico o fragilita’ finanziaria?," Departmental Working Papers 2008-24, Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano.
    9. A?da Kammoun & Imen Triki, 2016. "Credit Scoring Models for a Tunisian Microfinance Institution: Comparison between Artificial Neural Network and Logistic Regression," Review of Economics & Finance, Better Advances Press, Canada, vol. 6, pages 61-78, February.
    10. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    11. Michael Bucker & Gero Szepannek & Alicja Gosiewska & Przemyslaw Biecek, 2020. "Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring," Papers 2009.13384, arXiv.org.
    12. Singh, Ramendra Pratap & Singh, Ramendra & Mishra, Prashant, 2021. "Does managing customer accounts receivable impact customer relationships, and sales performance? An empirical investigation," Journal of Retailing and Consumer Services, Elsevier, vol. 60(C).
    13. Charitou, Andreas & Dionysiou, Dionysia & Lambertides, Neophytos & Trigeorgis, Lenos, 2013. "Alternative bankruptcy prediction models using option-pricing theory," Journal of Banking & Finance, Elsevier, vol. 37(7), pages 2329-2341.
    14. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    15. Sheng, Haiyang & Yu, Guan, 2023. "TNN: A transfer learning classifier based on weighted nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 193(C).
    16. Bernd Bischl & Julia Schiffner & Claus Weihs, 2013. "Benchmarking local classification methods," Computational Statistics, Springer, vol. 28(6), pages 2599-2619, December.
    17. Ting Sun & Miklos A. Vasarhelyi, 2018. "Predicting credit card delinquencies: An application of deep neural networks," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 25(4), pages 174-189, October.
    18. Raffaella Calabrese & Galina Andreeva & Jake Ansell, 2019. "“Birds of a Feather” Fail Together: Exploring the Nature of Dependency in SME Defaults," Risk Analysis, John Wiley & Sons, vol. 39(1), pages 71-84, January.
    19. Shuang Zhu & R. Pace, 2014. "Modeling Spatially Interdependent Mortgage Decisions," The Journal of Real Estate Finance and Economics, Springer, vol. 49(4), pages 598-620, November.
    20. Yu Xia & Ta Xu & Ming-Xia Wei & Zhen-Ke Wei & Lian-Jie Tang, 2023. "Predicting Chain’s Manufacturing SME Credit Risk in Supply Chain Finance Based on Machine Learning Methods," Sustainability, MDPI, vol. 15(2), pages 1-18, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:243:y:2015:i:1:p:177-189. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.