IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2506.00436.html
   My bibliography  Save this paper

Learning from Double Positive and Unlabeled Data for Potential-Customer Identification

Author

Listed:
  • Masahiro Kato
  • Yuki Ikeda
  • Kentaro Baba
  • Takashi Imai
  • Ryo Inokuchi

Abstract

In this study, we propose a method for identifying potential customers in targeted marketing by applying learning from positive and unlabeled data (PU learning). We consider a scenario in which a company sells a product and can observe only the customers who purchased it. Decision-makers seek to market products effectively based on whether people have loyalty to the company. Individuals with loyalty are those who are likely to remain interested in the company even without additional advertising. Consequently, those loyal customers would likely purchase from the company if they are interested in the product. In contrast, people with lower loyalty may overlook the product or buy similar products from other companies unless they receive marketing attention. Therefore, by focusing marketing efforts on individuals who are interested in the product but do not have strong loyalty, we can achieve more efficient marketing. To achieve this goal, we consider how to learn, from limited data, a classifier that identifies potential customers who (i) have interest in the product and (ii) do not have loyalty to the company. Although our algorithm comprises a single-stage optimization, its objective function implicitly contains two losses derived from standard PU learning settings. For this reason, we refer to our approach as double PU learning. We verify the validity of the proposed algorithm through numerical experiments, confirming that it functions appropriately for the problem at hand.

Suggested Citation

  • Masahiro Kato & Yuki Ikeda & Kentaro Baba & Takashi Imai & Ryo Inokuchi, 2025. "Learning from Double Positive and Unlabeled Data for Potential-Customer Identification," Papers 2506.00436, arXiv.org, revised Jun 2025.
  • Handle: RePEc:arx:papers:2506.00436
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2506.00436
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Gill Ward & Trevor Hastie & Simon Barry & Jane Elith & John R. Leathwick, 2009. "Presence-Only Data and the EM Algorithm," Biometrics, The International Biometric Society, vol. 65(2), pages 554-563, June.
    2. Lancaster, Tony & Imbens, Guido, 1996. "Case-control studies with contaminated controls," Journal of Econometrics, Elsevier, vol. 71(1-2), pages 145-160.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Erard Brian, 2022. "Modeling Qualitative Outcomes by Supplementing Participant Data with General Population Data: A New and More Versatile Approach," Journal of Econometric Methods, De Gruyter, vol. 11(1), pages 35-53, January.
    2. Małgorzata Łazęcka & Jan Mielniczuk & Paweł Teisseyre, 2021. "Estimating the class prior for positive and unlabelled data via logistic regression," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 1039-1068, December.
    3. Wenkai Li & Yuanchi Liu & Ziyue Liu & Zhen Gao & Huabing Huang & Weijun Huang, 2022. "A Positive-Unlabeled Learning Algorithm for Urban Flood Susceptibility Modeling," Land, MDPI, vol. 11(11), pages 1-17, November.
    4. Robert M. Dorazio, 2012. "Predicting the Geographic Distribution of a Species from Presence-Only Data Subject to Detection Errors," Biometrics, The International Biometric Society, vol. 68(4), pages 1303-1312, December.
    5. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    6. Masahiro Kato & Shota Yasui, 2020. "Learning Classifiers under Delayed Feedback with a Time Window Assumption," Papers 2009.13092, arXiv.org, revised Jun 2022.
    7. Esmerelda A. Ramalho & Richard Smith, 2003. "Discrete choice non-response," CeMMAP working papers 07/03, Institute for Fiscal Studies.
    8. Erard, Brian & Langetieg, Patrick & Payne, Mark & Plumley, Alan, 2020. "Ghosts in the Income Tax Machinery," MPRA Paper 100036, University Library of Munich, Germany.
    9. Amanda Coston & Edward H. Kennedy, 2022. "The role of the geometric mean in case-control studies," Papers 2207.09016, arXiv.org.
    10. Schwemmer, Philipp & Güpner, Franziska & Adler, Sven & Klingbeil, Knut & Garthe, Stefan, 2016. "Modelling small-scale foraging habitat use in breeding Eurasian oystercatchers (Haematopus ostralegus) in relation to prey distribution and environmental predictors," Ecological Modelling, Elsevier, vol. 320(C), pages 322-333.
    11. Ashton, John & Burnett, Tim & Diaz-Rainey, Ivan & Ormosi, Peter, 2021. "Known unknowns: How much financial misconduct is detected and deterred?," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 74(C).
    12. Vincenzo Caponi & Miana Plesca, 2014. "Empirical characteristics of legal and illegal immigrants in the USA," Journal of Population Economics, Springer;European Society for Population Economics, vol. 27(4), pages 923-960, October.
    13. Sung Jae Jun & Sokbae (Simon) Lee, 2020. "Causal inference in case-control studies," CeMMAP working papers CWP19/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    14. Lee, Kangbok & Joo, Sunghoon & Baik, Hyeoncheol & Han, Sumin & In, Joonhwan, 2020. "Unbalanced data, type II error, and nonlinearity in predicting M&A failure," Journal of Business Research, Elsevier, vol. 109(C), pages 271-287.
    15. Saupe, E.E. & Barve, V. & Myers, C.E. & Soberón, J. & Barve, N. & Hensz, C.M. & Peterson, A.T. & Owens, H.L. & Lira-Noriega, A., 2012. "Variation in niche and distribution model performance: The need for a priori assessment of key causal factors," Ecological Modelling, Elsevier, vol. 237, pages 11-22.
    16. Becker, Bo & Cronqvist, Henrik & Fahlenbrach, Rüdiger, 2011. "Estimating the Effects of Large Shareholders Using a Geographic Instrument," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 46(4), pages 907-942, August.
    17. Adam M. Kleinbaum & Toby E. Stuart & Michael L. Tushman, 2013. "Discretion Within Constraint: Homophily and Structure in a Formal Organization," Organization Science, INFORMS, vol. 24(5), pages 1316-1336, October.
    18. Herkt, K. Matthias B. & Barnikel, Günter & Skidmore, Andrew K. & Fahr, Jakob, 2016. "A high-resolution model of bat diversity and endemism for continental Africa," Ecological Modelling, Elsevier, vol. 320(C), pages 9-28.
    19. Gill Ward & Trevor Hastie & Simon Barry & Jane Elith & John R. Leathwick, 2009. "Presence-Only Data and the EM Algorithm," Biometrics, The International Biometric Society, vol. 65(2), pages 554-563, June.
    20. Ramalho, Esmeralda A., 2007. "Binary models with misclassification in the variable of interest and nonignorable nonresponse," Economics Letters, Elsevier, vol. 96(1), pages 70-76, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2506.00436. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.