Author
Listed:
- Yongqin Qiu
(International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China)
- Yuanxing Chen
(Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China)
- Kan Fang
(College of Management and Economics, Tianjin University, Tianjin 300072, China; and Laboratory of Computation and Analytics of Complex Management Systems (CACMS), Tianjin University, Tianjin 300072, China)
- Lean Yu
(Business School, Sichuan University, Chengdu 610065, China)
- Kuangnan Fang
(School of Economics, Xiamen University, Xiamen 316005, China)
Abstract
In credit fraud detection practice, certain fraudulent transactions often evade detection because of the hidden nature of fraudulent behavior. To address this issue, an increasing number of positive-unlabeled (PU) learning techniques have been employed by more and more financial institutions. However, most of these methods are designed for single data sets and do not take into account the heterogeneity of data when they are collected from different sources. In this paper, we propose an integrative PU learning method (I-PU) for pooling information from multiple heterogeneous PU data sets. A novel approach that penalizes group differences is developed to explicitly and automatically identify the cluster structures of coefficients across different data sets, thus offering a plausible interpretation of heterogeneity. Furthermore, we apply a bilevel selection method to detect the sparse structure at both the group level and within-group level. Theoretically, we show that our proposed estimator has the oracle property. Computationally, we design an expectation-maximization (EM) algorithm framework and propose an alternating direction method of multipliers (ADMM) algorithm to solve it. Simulation results show that our proposed method has better numerical performance in terms of variable selection, parameter estimation, and prediction ability. Finally, a real-world application showcases the effectiveness of our method in identifying distinct coefficient clusters and its superior prediction performance compared with direct data merging or separate modeling. This result also offers valuable insights for financial institutions in developing targeted fraud detection systems.
Suggested Citation
Yongqin Qiu & Yuanxing Chen & Kan Fang & Lean Yu & Kuangnan Fang, 2025.
"Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data,"
INFORMS Journal on Computing, INFORMS, vol. 37(4), pages 998-1017, July.
Handle:
RePEc:inm:orijoc:v:37:y:2025:i:4:p:998-1017
DOI: 10.1287/ijoc.2023.0366
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:37:y:2025:i:4:p:998-1017. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.