IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i15p2733-d878537.html
   My bibliography  Save this article

Survey on Synthetic Data Generation, Evaluation Methods and GANs

Author

Listed:
  • Alvaro Figueira

    (CRACS-INESC TEC, University of Porto, 4169-007 Porto, Portugal
    These authors contributed equally to this work.)

  • Bruno Vaz

    (Faculty of Sciences, University of Porto, Rua do Campo Alegre, s/n, 4169-007 Porto, Portugal
    These authors contributed equally to this work.)

Abstract

Synthetic data consists of artificially generated data. When data are scarce, or of poor quality, synthetic data can be used, for example, to improve the performance of machine learning models. Generative adversarial networks (GANs) are a state-of-the-art deep generative models that can generate novel synthetic samples that follow the underlying data distribution of the original dataset. Reviews on synthetic data generation and on GANs have already been written. However, none in the relevant literature, to the best of our knowledge, has explicitly combined these two topics. This survey aims to fill this gap and provide useful material to new researchers in this field. That is, we aim to provide a survey that combines synthetic data generation and GANs, and that can act as a good and strong starting point for new researchers in the field, so that they have a general overview of the key contributions and useful references. We have conducted a review of the state-of-the-art by querying four major databases: Web of Sciences (WoS), Scopus, IEEE Xplore, and ACM Digital Library. This allowed us to gain insights into the most relevant authors, the most relevant scientific journals in the area, the most cited papers, the most significant research areas, the most important institutions, and the most relevant GAN architectures. GANs were thoroughly reviewed, as well as their most common training problems, their most important breakthroughs, and a focus on GAN architectures for tabular data. Further, the main algorithms for generating synthetic data, their applications and our thoughts on these methods are also expressed. Finally, we reviewed the main techniques for evaluating the quality of synthetic data (especially tabular data) and provided a schematic overview of the information presented in this paper.

Suggested Citation

  • Alvaro Figueira & Bruno Vaz, 2022. "Survey on Synthetic Data Generation, Evaluation Methods and GANs," Mathematics, MDPI, vol. 10(15), pages 1-41, August.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:15:p:2733-:d:878537
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/15/2733/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/15/2733/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sergey I. Nikolenko, 2021. "Synthetic Data for Deep Learning," Springer Optimization and Its Applications, Springer, number 978-3-030-75178-4, September.
    2. Yuan Zhou & Fang Dong & Yufei Liu & Zhaofu Li & JunFei Du & Li Zhang, 2020. "Forecasting emerging technologies using data augmentation and deep learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 1-29, April.
    3. Sergey I. Nikolenko, 2021. "Synthetic Data Outside Computer Vision," Springer Optimization and Its Applications, in: Synthetic Data for Deep Learning, chapter 0, pages 217-226, Springer.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chen, Zhiqiang & Li, Jianbin & Cheng, Long & Liu, Xiufeng, 2023. "Federated-WDCGAN: A federated smart meter data sharing framework for privacy preservation," Applied Energy, Elsevier, vol. 334(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dejing Kong & Jianzhong Yang & Lingfeng Li, 2020. "Early identification of technological convergence in numerical control machine tool: a deep learning approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 1983-2009, December.
    2. Zamani, Mehdi & Yalcin, Haydar & Naeini, Ali Bonyadi & Zeba, Gordana & Daim, Tugrul U, 2022. "Developing metrics for emerging technologies: identification and assessment," Technological Forecasting and Social Change, Elsevier, vol. 176(C).
    3. June Young Lee & Sejung Ahn & Dohyun Kim, 2021. "Deep learning-based prediction of future growth potential of technologies," PLOS ONE, Public Library of Science, vol. 16(6), pages 1-16, June.
    4. Yunlei Lin & Yuan Zhou, 2023. "Identification of Hydrogen-Energy-Related Emerging Technologies Based on Text Mining," Sustainability, MDPI, vol. 16(1), pages 1-19, December.
    5. Myoungjae Choi & Ohjin Kwon & Dongkyu Won & Wooseok Jang, 2021. "Identifying the Policy Direction of National R&D Programs Based on Data Envelopment Analysis and Diversity Index Approach," Sustainability, MDPI, vol. 13(22), pages 1-17, November.
    6. Guannan Xu & Weijie Hu & Yuanyuan Qiao & Yuan Zhou, 2020. "Mapping an innovation ecosystem using network clustering and community identification: a multi-layered framework," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 2057-2081, September.
    7. Yuan Zhou & Fang Dong & Yufei Liu & Liang Ran, 2021. "A deep learning framework to early identify emerging technologies in large-scale outlier patents: an empirical study of CNC machine tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 969-994, February.
    8. Hasan Tercan & Tobias Meisen, 2022. "Machine learning and deep learning based predictive quality in manufacturing: a systematic review," Journal of Intelligent Manufacturing, Springer, vol. 33(7), pages 1879-1905, October.
    9. Ryosuke L. Ohniwa & Kunio Takeyasu & Aiko Hibino, 2022. "Researcher dynamics in the generation of emerging topics in life sciences and medicine," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 871-884, February.
    10. Li Yao & He Ni, 2023. "Prediction of patent grant and interpreting the key determinants: an application of interpretable machine learning approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 4933-4969, September.
    11. Liang Chen & Shuo Xu & Lijun Zhu & Jing Zhang & Xiaoping Lei & Guancan Yang, 2020. "A deep learning based method for extracting semantic information from patent documents," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 289-312, October.
    12. Puccetti, Giovanni & Giordano, Vito & Spada, Irene & Chiarello, Filippo & Fantoni, Gualtiero, 2023. "Technology identification from patent texts: A novel named entity recognition method," Technological Forecasting and Social Change, Elsevier, vol. 186(PB).
    13. Gozuacik, Necip & Sakar, C. Okan & Ozcan, Sercan, 2023. "Technological forecasting based on estimation of word embedding matrix using LSTM networks," Technological Forecasting and Social Change, Elsevier, vol. 191(C).
    14. Huailan Liu & Zhiwang Chen & Jie Tang & Yuan Zhou & Sheng Liu, 2020. "Mapping the technology evolution path: a novel model for dynamic topic detection and tracking," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2043-2090, December.
    15. Ravil I. Mukhamediev & Yelena Popova & Yan Kuchin & Elena Zaitseva & Almas Kalimoldayev & Adilkhan Symagulov & Vitaly Levashenko & Farida Abdoldina & Viktors Gopejenko & Kirill Yakunin & Elena Muhamed, 2022. "Review of Artificial Intelligence and Machine Learning Technologies: Classification, Restrictions, Opportunities and Challenges," Mathematics, MDPI, vol. 10(15), pages 1-25, July.
    16. Shouji Fujimoto & Atushi Ishikawa & Takayuki Mizuno, 2022. "Copula-Based Synthetic Data Generation in Firm-Size Variables," The Review of Socionetwork Strategies, Springer, vol. 16(2), pages 479-492, October.
    17. Delgado, Guillem & Cortés, Andoni & García, Sara & Loyo, Estíbaliz & Berasategi, Maialen & Aranjuelo, Nerea, 2023. "Methodology for generating synthetic labeled datasets for visual container inspection," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 175(C).
    18. Hain, Daniel S. & Jurowetzki, Roman & Buchmann, Tobias & Wolf, Patrick, 2022. "A text-embedding-based approach to measuring patent-to-patent technological similarity," Technological Forecasting and Social Change, Elsevier, vol. 177(C).
    19. Zhai, Dongsheng & Zhai, Liang & Li, Mengyang & He, Xijun & Xu, Shuo & Wang, Feifei, 2022. "Patent representation learning with a novel design of patent ontology: Case study on PEM patents," Technological Forecasting and Social Change, Elsevier, vol. 183(C).
    20. Sonan Memon, 2022. "Inflation in Pakistan: High-Frequency Estimation and Forecasting," PIDE-Working Papers 2022:12, Pakistan Institute of Development Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:15:p:2733-:d:878537. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.