IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v37y2025i4p998-1017.html
   My bibliography  Save this article

Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data

Author

Listed:
  • Yongqin Qiu

    (International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China)

  • Yuanxing Chen

    (Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China)

  • Kan Fang

    (College of Management and Economics, Tianjin University, Tianjin 300072, China; and Laboratory of Computation and Analytics of Complex Management Systems (CACMS), Tianjin University, Tianjin 300072, China)

  • Lean Yu

    (Business School, Sichuan University, Chengdu 610065, China)

  • Kuangnan Fang

    (School of Economics, Xiamen University, Xiamen 316005, China)

Abstract

In credit fraud detection practice, certain fraudulent transactions often evade detection because of the hidden nature of fraudulent behavior. To address this issue, an increasing number of positive-unlabeled (PU) learning techniques have been employed by more and more financial institutions. However, most of these methods are designed for single data sets and do not take into account the heterogeneity of data when they are collected from different sources. In this paper, we propose an integrative PU learning method (I-PU) for pooling information from multiple heterogeneous PU data sets. A novel approach that penalizes group differences is developed to explicitly and automatically identify the cluster structures of coefficients across different data sets, thus offering a plausible interpretation of heterogeneity. Furthermore, we apply a bilevel selection method to detect the sparse structure at both the group level and within-group level. Theoretically, we show that our proposed estimator has the oracle property. Computationally, we design an expectation-maximization (EM) algorithm framework and propose an alternating direction method of multipliers (ADMM) algorithm to solve it. Simulation results show that our proposed method has better numerical performance in terms of variable selection, parameter estimation, and prediction ability. Finally, a real-world application showcases the effectiveness of our method in identifying distinct coefficient clusters and its superior prediction performance compared with direct data merging or separate modeling. This result also offers valuable insights for financial institutions in developing targeted fraud detection systems.

Suggested Citation

  • Yongqin Qiu & Yuanxing Chen & Kan Fang & Lean Yu & Kuangnan Fang, 2025. "Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data," INFORMS Journal on Computing, INFORMS, vol. 37(4), pages 998-1017, July.
  • Handle: RePEc:inm:orijoc:v:37:y:2025:i:4:p:998-1017
    DOI: 10.1287/ijoc.2023.0366
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2023.0366
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2023.0366?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:37:y:2025:i:4:p:998-1017. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.