IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2401.00081.html
   My bibliography  Save this paper

Synthetic Data Applications in Finance

Author

Listed:
  • Vamsi K. Potluru
  • Daniel Borrajo
  • Andrea Coletta
  • Niccol`o Dalmasso
  • Yousef El-Laham
  • Elizabeth Fons
  • Mohsen Ghassemi
  • Sriram Gopalakrishnan
  • Vikesh Gosai
  • Eleonora Kreav{c}i'c
  • Ganapathy Mani
  • Saheed Obitayo
  • Deepak Paramanand
  • Natraj Raman
  • Mikhail Solonin
  • Srijan Sood
  • Svitlana Vyetrenko
  • Haibei Zhu
  • Manuela Veloso
  • Tucker Balch

Abstract

Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.

Suggested Citation

  • Vamsi K. Potluru & Daniel Borrajo & Andrea Coletta & Niccol`o Dalmasso & Yousef El-Laham & Elizabeth Fons & Mohsen Ghassemi & Sriram Gopalakrishnan & Vikesh Gosai & Eleonora Kreav{c}i'c & Ganapathy Ma, 2023. "Synthetic Data Applications in Finance," Papers 2401.00081, arXiv.org, revised Mar 2024.
  • Handle: RePEc:arx:papers:2401.00081
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2401.00081
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Andrea Coletta & Joseph Jerome & Rahul Savani & Svitlana Vyetrenko, 2023. "Conditional Generators for Limit Order Book Environments: Explainability, Challenges, and Robustness," Papers 2306.12806, arXiv.org.
    2. Magnus Wiese & Robert Knobloch & Ralf Korn & Peter Kretschmer, 2020. "Quant GANs: deep generation of financial time series," Quantitative Finance, Taylor & Francis Journals, vol. 20(9), pages 1419-1440, September.
    3. Brian Kenji Iwana & Seiichi Uchida, 2021. "An empirical survey of data augmentation for time series classification with neural networks," PLOS ONE, Public Library of Science, vol. 16(7), pages 1-32, July.
    4. Douglas J. White, 1985. "Real Applications of Markov Decision Processes," Interfaces, INFORMS, vol. 15(6), pages 73-83, December.
    5. Thomas Hegghammer, 2022. "OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment," Journal of Computational Social Science, Springer, vol. 5(1), pages 861-882, May.
    6. Nicole Bäuerle & Jonathan Ott, 2011. "Markov Decision Processes with Average-Value-at-Risk criteria," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 74(3), pages 361-379, December.
    7. Yosihiko Ogata, 1998. "Space-Time Point-Process Models for Earthquake Occurrences," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 50(2), pages 379-402, June.
    8. Dimitris N. Chorafas, 1995. "Financial Models and Simulation," Palgrave Macmillan Books, Palgrave Macmillan, number 978-0-230-37483-6, November.
    9. Isham, Valerie & Westcott, Mark, 1979. "A self-correcting point process," Stochastic Processes and their Applications, Elsevier, vol. 8(3), pages 335-347, May.
    10. Chiang, Wen-Hao & Liu, Xueying & Mohler, George, 2022. "Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates," International Journal of Forecasting, Elsevier, vol. 38(2), pages 505-520.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Philip A. White & Alan E. Gelfand, 2021. "Generalized Evolutionary Point Processes: Model Specifications and Model Comparison," Methodology and Computing in Applied Probability, Springer, vol. 23(3), pages 1001-1021, September.
    2. Song Wei & Andrea Coletta & Svitlana Vyetrenko & Tucker Balch, 2023. "INTAGS: Interactive Agent-Guided Simulation," Papers 2309.01784, arXiv.org, revised Nov 2023.
    3. Bokai Cao & Xueyuan Lin & Yiyan Qi & Chengjin Xu & Cehao Yang & Jian Guo, 2025. "Financial Wind Tunnel: A Retrieval-Augmented Market Simulator," Papers 2503.17909, arXiv.org.
    4. Bokai Cao & Saizhuo Wang & Xinyi Lin & Xiaojun Wu & Haohan Zhang & Lionel M. Ni & Jian Guo, 2025. "From Deep Learning to LLMs: A survey of AI in Quantitative Investment," Papers 2503.21422, arXiv.org.
    5. D. Gospodinov & V. Karakostas & E. Papadimitriou, 2015. "Seismicity rate modeling for prospective stochastic forecasting: the case of 2014 Kefalonia, Greece, seismic excitation," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 79(2), pages 1039-1058, November.
    6. Scalas, Enrico & Kaizoji, Taisei & Kirchler, Michael & Huber, Jürgen & Tedeschi, Alessandra, 2006. "Waiting times between orders and trades in double-auction markets," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 366(C), pages 463-471.
    7. Michael Karpe, 2020. "An overall view of key problems in algorithmic trading and recent progress," Papers 2006.05515, arXiv.org.
    8. Huang, Lorick & Khabou, Mahmoud, 2023. "Nonlinear Poisson autoregression and nonlinear Hawkes processes," Stochastic Processes and their Applications, Elsevier, vol. 161(C), pages 201-241.
    9. Solveig Flaig & Gero Junike, 2022. "Scenario Generation for Market Risk Models Using Generative Neural Networks," Risks, MDPI, vol. 10(11), pages 1-28, October.
    10. Blanka Horvath & Josef Teichmann & Žan Žurič, 2021. "Deep Hedging under Rough Volatility," Risks, MDPI, vol. 9(7), pages 1-20, July.
    11. Alexandre Miot, 2020. "Adversarial trading," Papers 2101.03128, arXiv.org.
    12. Steffen Volkenand & Günther Filler & Martin Odening, 2020. "Price Discovery and Market Reflexivity in Agricultural Futures Contracts with Different Maturities," Risks, MDPI, vol. 8(3), pages 1-17, July.
    13. Dewei Wang & Chendi Jiang & Chanseok Park, 2019. "Reliability analysis of load-sharing systems with memory," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(2), pages 341-360, April.
    14. G. Cleanthous & Athanasios G. Georgiadis & P. A. White, 2025. "Pointwise density estimation on metric spaces and applications in seismology," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 88(2), pages 119-148, February.
    15. Myladis R. Cogollo & Gilberto González-Parra & Abraham J. Arenas, 2021. "Modeling and Forecasting Cases of RSV Using Artificial Neural Networks," Mathematics, MDPI, vol. 9(22), pages 1-20, November.
    16. Jamie Olson & Kathleen Carley, 2013. "Exact and approximate EM estimation of mutually exciting hawkes processes," Statistical Inference for Stochastic Processes, Springer, vol. 16(1), pages 63-80, April.
    17. Kuroda, Kaori & Hashiguchi, Hiroki & Fujiwara, Kantaro & Ikeguchi, Tohru, 2014. "Reconstruction of network structures from marked point processes using multi-dimensional scaling," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 415(C), pages 194-204.
    18. van den Hengel, G. & Franses, Ph.H.B.F., 2018. "Forecasting social conflicts in Africa using an Epidemic Type Aftershock Sequence model," Econometric Institute Research Papers EI2018-31, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    19. Mohammad Shahin & F. Frank Chen & Ali Hosseinzadeh, 2024. "Machine-based identification system via optical character recognition," Flexible Services and Manufacturing Journal, Springer, vol. 36(2), pages 453-480, June.
    20. Edmond Lezmi & Jules Roche & Thierry Roncalli & Jiali Xu, 2020. "Improving the Robustness of Trading Strategy Backtesting with Boltzmann Machines and Generative Adversarial Networks," Papers 2007.04838, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2401.00081. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.