IDEAS home Printed from https://ideas.repec.org/a/gam/jecnmx/v11y2023i3p22-d1229353.html
   My bibliography  Save this article

Detecting Pump-and-Dumps with Crypto-Assets: Dealing with Imbalanced Datasets and Insiders’ Anticipated Purchases

Author

Listed:
  • Dean Fantazzini

    (Moscow School of Economics, Moscow State University, 119992 Moscow, Russia
    Faculty of Economic Sciences, Higher School of Economics, 109028 Moscow, Russia
    These authors contributed equally to this work.)

  • Yufeng Xiao

    (Moscow School of Economics, Moscow State University, 119992 Moscow, Russia
    These authors contributed equally to this work.)

Abstract

Detecting pump-and-dump schemes involving cryptoassets with high-frequency data is challenging due to imbalanced datasets and the early occurrence of unusual trading volumes. To address these issues, we propose constructing synthetic balanced datasets using resampling methods and flagging a pump-and-dump from the moment of public announcement up to 60 min beforehand. We validated our proposals using data from Pumpolymp and the CryptoCurrency eXchange Trading Library to identify 351 pump signals relative to the Binance crypto exchange in 2021 and 2022. We found that the most effective approach was using the original imbalanced dataset with pump-and-dumps flagged 60 min in advance, together with a random forest model with data segmented into 30-s chunks and regressors computed with a moving window of 1 h. Our analysis revealed that a better balance between sensitivity and specificity could be achieved by simply selecting an appropriate probability threshold, such as setting the threshold close to the observed prevalence in the original dataset. Resampling methods were useful in some cases, but threshold-independent measures were not affected. Moreover, detecting pump-and-dumps in real-time involves high-dimensional data, and the use of resampling methods to build synthetic datasets can be time-consuming, making them less practical.

Suggested Citation

  • Dean Fantazzini & Yufeng Xiao, 2023. "Detecting Pump-and-Dumps with Crypto-Assets: Dealing with Imbalanced Datasets and Insiders’ Anticipated Purchases," Econometrics, MDPI, vol. 11(3), pages 1-73, August.
  • Handle: RePEc:gam:jecnmx:v:11:y:2023:i:3:p:22-:d:1229353
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2225-1146/11/3/22/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2225-1146/11/3/22/
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Freeman, Elizabeth A. & Moisen, Gretchen G., 2008. "A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa," Ecological Modelling, Elsevier, vol. 217(1), pages 48-58.
    2. Rosa A. Schiavo & David J. Hand, 2000. "Ten More Years of Error Rate Research," International Statistical Review, International Statistical Institute, vol. 68(3), pages 295-310, December.
    3. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    4. Withanawasam, R.M. & Whigham, P.A. & Crack, T.F., 2013. "Characterising trader manipulation in a limit-order driven market," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 93(C), pages 43-52.
    5. Massimo La Morgia & Alessandro Mei & Francesco Sassi & Julinda Stefa, 2020. "Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations," Papers 2005.06610, arXiv.org.
    6. Anirudh Dhawan & Tālis J Putniņš, 2023. "A New Wolf in Town? Pump-and-Dump Manipulation in Cryptocurrency Markets," Review of Finance, European Finance Association, vol. 27(3), pages 935-975.
    7. López-Ratón, Mónica & Rodríguez-Álvarez, María Xosé & Cadarso-Suárez, Carmen & Gude-Sampedro, Francisco, 2014. "OptimalCutpoints: An R Package for Selecting Optimal Cutpoints in Diagnostic Tests," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i08).
    8. Gandal, Neil & Hamrick, JT & Rouhi, Farhang & Mukherjee, Arghya & Feder, Amir & Moore, Tyler & Vasek, Marie, 2018. "The Economics of Cryptocurrency Pump and Dump Schemes," CEPR Discussion Papers 13404, C.E.P.R. Discussion Papers.
    9. Jiahua Xu & Benjamin Livshits, 2018. "The Anatomy of a Cryptocurrency Pump-and-Dump Scheme," Papers 1811.10109, arXiv.org, revised Aug 2019.
    10. Ouyang, Liangyi & Cao, Bolong, 2020. "Selective pump-and-dump: The manipulation of their top holdings by Chinese mutual funds around quarter-ends," Emerging Markets Review, Elsevier, vol. 44(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sihao Hu & Zhen Zhang & Shengliang Lu & Bingsheng He & Zhao Li, 2022. "Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump," Papers 2204.12929, arXiv.org, revised Apr 2023.
    2. Taro Tsuchiya, 2021. "Profitability of cryptocurrency Pump and Dump schemes," Digital Finance, Springer, vol. 3(2), pages 149-167, June.
    3. Mohammad Javad Rajaei & Qusay H. Mahmoud, 2023. "A Survey on Pump and Dump Detection in the Cryptocurrency Market Using Machine Learning," Future Internet, MDPI, vol. 15(8), pages 1-17, August.
    4. Kaihua Qin & Liyi Zhou & Yaroslav Afonin & Ludovico Lazzaretti & Arthur Gervais, 2021. "CeFi vs. DeFi -- Comparing Centralized to Decentralized Finance," Papers 2106.08157, arXiv.org, revised Jun 2021.
    5. David Ardia & Keven Bluteau, 2023. "The Role of Twitter in Cryptocurrency Pump-and-Dumps," Papers 2306.02148, arXiv.org.
    6. Lars Hornuf & Paul P. Momtaz & Rachel J. Nam & Ye Yuan, 2023. "Cybercrime on the Ethereum Blockchain," CESifo Working Paper Series 10598, CESifo.
    7. Angel M. Morales & Patrick Tarwater & Indika Mallawaarachchi & Alok Kumar Dwivedi & Juan B. Figueroa-Casas, 2015. "Multinomial logistic regression approach for the evaluation of binary diagnostic test in medical research," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 16(2), pages 203-222, June.
    8. F. Gauthier & D. Germain & B. Hétu, 2017. "Logistic models as a forecasting tool for snow avalanches in a cold maritime climate: northern Gaspésie, Québec, Canada," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 89(1), pages 201-232, October.
    9. Douglas Cumming & Lars Hornuf & Moein Karami & Denis Schweizer, 2023. "Disentangling Crowdfunding from Fraudfunding," Journal of Business Ethics, Springer, vol. 182(4), pages 1103-1128, February.
    10. Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
    11. Cemal Eren Arbatlı & Quamrul H. Ashraf & Oded Galor & Marc Klemp, 2020. "Diversity and Conflict," Econometrica, Econometric Society, vol. 88(2), pages 727-797, March.
    12. Lo Turco, Alessia & Maggioni, Daniela, 2018. "Effects of Islamic religiosity on bilateral trust in trade: The case of Turkish exports," Journal of Comparative Economics, Elsevier, vol. 46(4), pages 947-965.
    13. Matija Kovacic & Claudio Zoli, 2021. "Ethnic distribution, effective power and conflict," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 57(2), pages 257-299, August.
    14. Blackman, Allen & Guerrero, Santiago, 2012. "What drives voluntary eco-certification in Mexico?," Journal of Comparative Economics, Elsevier, vol. 40(2), pages 256-268.
    15. Jacob Ausderan, 2018. "Reassessing the democratic advantage in interstate wars using k-adic datasets," Conflict Management and Peace Science, Peace Science Society (International), vol. 35(5), pages 451-473, September.
    16. Paul Poast, 2013. "Issue linkage and international cooperation: An empirical investigation," Conflict Management and Peace Science, Peace Science Society (International), vol. 30(3), pages 286-303, July.
    17. Václavík, Tomáš & Meentemeyer, Ross K., 2009. "Invasive species distribution modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions?," Ecological Modelling, Elsevier, vol. 220(23), pages 3248-3258.
    18. Yerko Rojas, 2017. "Evictions and short-term all-cause mortality: a 3-year follow-up study of a middle-aged Swedish population," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 62(3), pages 343-351, April.
    19. Mehrez Ben Slama & Dhafer Saidane & Hassouna Fedhila, 2012. "How to identify targets in the M&A banking operations? Case of cross-border strategies in Europe by line of activity," Review of Quantitative Finance and Accounting, Springer, vol. 38(2), pages 209-240, February.
    20. Marcin Chlebus, 2014. "One-day prediction of state of turbulence for financial instrument based on models for binary dependent variable," Ekonomia journal, Faculty of Economic Sciences, University of Warsaw, vol. 37.

    More about this item

    Keywords

    pump-and-dump; crypto-assets; minority class; class imbalance; machine learning; random forests;
    All these keywords.

    JEL classification:

    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C35 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C51 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Construction and Estimation
    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods
    • C58 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Financial Econometrics
    • G17 - Financial Economics - - General Financial Markets - - - Financial Forecasting and Simulation
    • G32 - Financial Economics - - Corporate Finance and Governance - - - Financing Policy; Financial Risk and Risk Management; Capital and Ownership Structure; Value of Firms; Goodwill
    • K42 - Law and Economics - - Legal Procedure, the Legal System, and Illegal Behavior - - - Illegal Behavior and the Enforcement of Law

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jecnmx:v:11:y:2023:i:3:p:22-:d:1229353. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.