IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v9y2021i4p58-d523212.html
   My bibliography  Save this article

Synthetic Dataset Generation of Driver Telematics

Author

Listed:
  • Banghee So

    (Department of Mathematics, University of Connecticut, 341 Mansfield Road, Storrs, CT 06269-1009, USA)

  • Jean-Philippe Boucher

    (Département de Mathématiques, Université du Québec à Montréal, 201, Avenue du Président-Kennedy, Montréal, QC H2X 3Y7, Canada)

  • Emiliano A. Valdez

    (Department of Mathematics, University of Connecticut, 341 Mansfield Road, Storrs, CT 06269-1009, USA)

Abstract

This article describes the techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations regarding driver’s claims experience, together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process while using machine learning algorithms. In the first stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The second stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The third stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work ot be valuable.

Suggested Citation

  • Banghee So & Jean-Philippe Boucher & Emiliano A. Valdez, 2021. "Synthetic Dataset Generation of Driver Telematics," Risks, MDPI, vol. 9(4), pages 1-19, March.
  • Handle: RePEc:gam:jrisks:v:9:y:2021:i:4:p:58-:d:523212
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/9/4/58/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/9/4/58/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.
    2. Dalkilic, Turkan Erbay & Tank, Fatih & Kula, Kamile Sanli, 2009. "Neural networks approach for determining total claim amounts in insurance," Insurance: Mathematics and Economics, Elsevier, vol. 45(2), pages 236-241, October.
    3. Mercedes Ayuso & Montserrat Guillen & Jens Perch Nielsen, 2019. "Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data," Transportation, Springer, vol. 46(3), pages 735-752, June.
    4. Jean-Philippe Boucher & Michel Denuit & Montserrat Guillén, 2007. "Risk Classification for Claim Counts," North American Actuarial Journal, Taylor & Francis Journals, vol. 11(4), pages 110-131.
    5. Andrea Gabrielli & Mario V. Wüthrich, 2018. "An Individual Claims History Simulation Machine," Risks, MDPI, vol. 6(2), pages 1-32, March.
    6. Montserrat Guillen & Jens Perch Nielsen & Ana M. Pérez-Marín & Valandis Elpidorou, 2020. "Can Automobile Insurance Telematics Predict the Risk of Near-Miss Events?," North American Actuarial Journal, Taylor & Francis Journals, vol. 24(1), pages 141-152, January.
    7. Montserrat Guillen & Jens Perch Nielsen & Mercedes Ayuso & Ana M. Pérez‐Marín, 2019. "The Use of Telematics Devices to Improve Automobile Insurance Rates," Risk Analysis, John Wiley & Sons, vol. 39(3), pages 662-672, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shengkun Xie & Kun Shi, 2023. "Generalised Additive Modelling of Auto Insurance Data with Territory Design: A Rate Regulation Perspective," Mathematics, MDPI, vol. 11(2), pages 1-24, January.
    2. Simon, Pierre-Alexandre & Trufin, Julien & Denuit, Michel, 2023. "Bivariate Poisson credibility model and bonus-malus scale for claim and near-claim events," LIDAM Discussion Papers ISBA 2023014, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhiyu Quan & Changyue Hu & Panyi Dong & Emiliano A. Valdez, 2024. "Improving Business Insurance Loss Models by Leveraging InsurTech Innovation," Papers 2401.16723, arXiv.org.
    2. Marjan Qazvini, 2019. "On the Validation of Claims with Excess Zeros in Liability Insurance: A Comparative Study," Risks, MDPI, vol. 7(3), pages 1-17, June.
    3. Francis Duval & Jean‐Philippe Boucher & Mathieu Pigeon, 2023. "Enhancing claim classification with feature extraction from anomaly‐detection‐derived routine and peculiarity profiles," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 90(2), pages 421-458, June.
    4. Jennifer S. K. Chan & S. T. Boris Choy & Udi Makov & Ariel Shamir & Vered Shapovalov, 2022. "Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data," Risks, MDPI, vol. 10(4), pages 1-10, April.
    5. Donatella Porrini & Giulio Fusco & Cosimo Magazzino, 2020. "Black boxes and market efficiency: the effect on premiums in the Italian motor-vehicle insurance market," European Journal of Law and Economics, Springer, vol. 49(3), pages 455-472, June.
    6. Jean-Philippe Boucher & Roxane Turcotte, 2020. "A Longitudinal Analysis of the Impact of Distance Driven on the Probability of Car Accidents," Risks, MDPI, vol. 8(3), pages 1-19, September.
    7. Shengkun Xie, 2021. "Improving Explainability of Major Risk Factors in Artificial Neural Networks for Auto Insurance Rate Regulation," Risks, MDPI, vol. 9(7), pages 1-21, July.
    8. Deprez, Laurens & Antonio, Katrien & Boute, Robert, 2021. "Pricing service maintenance contracts using predictive analytics," European Journal of Operational Research, Elsevier, vol. 290(2), pages 530-545.
    9. Ramon Alemany & Catalina Bolancé & Roberto Rodrigo & Raluca Vernic, 2020. "Bivariate Mixed Poisson and Normal Generalised Linear Models with Sarmanov Dependence—An Application to Model Claim Frequency and Optimal Transformed Average Severity," Mathematics, MDPI, vol. 9(1), pages 1-18, December.
    10. Alfiero, Simona & Battisti, Enrico & Ηadjielias, Elias, 2022. "Black box technology, usage-based insurance, and prediction of purchase behavior: Evidence from the auto insurance sector," Technological Forecasting and Social Change, Elsevier, vol. 183(C).
    11. Lluís Bermúdez & Dimitris Karlis & Isabel Morillo, 2020. "Modelling Unobserved Heterogeneity in Claim Counts Using Finite Mixture Models," Risks, MDPI, vol. 8(1), pages 1-13, January.
    12. Denuit, Michel & Guillen, Montserrat & Trufin, Julien, 2018. "Multivariate credibility modeling for usage-based motor insurance pricing with behavioral data," LIDAM Discussion Papers ISBA 2018032, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    13. Montserrat Guillen & Jens Perch Nielsen & Ana M. Pérez‐Marín, 2021. "Near‐miss telematics in motor insurance," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 88(3), pages 569-589, September.
    14. Yves Staudt & Joël Wagner, 2021. "Assessing the Performance of Random Forests for Modeling Claim Severity in Collision Car Insurance," Risks, MDPI, vol. 9(3), pages 1-28, March.
    15. Gao, Lisa & Shi, Peng, 2022. "Leveraging high-resolution weather information to predict hail damage claims: A spatial point process for replicated point patterns," Insurance: Mathematics and Economics, Elsevier, vol. 107(C), pages 161-179.
    16. Christopher Blier-Wong & Hélène Cossette & Luc Lamontagne & Etienne Marceau, 2020. "Machine Learning in P&C Insurance: A Review for Pricing and Reserving," Risks, MDPI, vol. 9(1), pages 1-26, December.
    17. Montserrat Guillen & Ana M. Pérez-Marín & Mercedes Ayuso & Jens Perch Nielsen, 2018. "“Exposure to risk increases the excess of zero accident claims frequency in automobile insurance”," IREA Working Papers 201810, University of Barcelona, Research Institute of Applied Economics, revised May 2018.
    18. Peng Shi & Glenn M. Fung & Daniel Dickinson, 2022. "Assessing hail risk for property insurers with a dependent marked point process," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 302-328, January.
    19. Montserrat Guillen & Jens Perch Nielsen & Mercedes Ayuso & Ana M. Pérez‐Marín, 2019. "The Use of Telematics Devices to Improve Automobile Insurance Rates," Risk Analysis, John Wiley & Sons, vol. 39(3), pages 662-672, March.
    20. Meng, Shengwang & Gao, Yaqian & Huang, Yifan, 2022. "Actuarial intelligence in auto insurance: Claim frequency modeling with driving behavior features and improved boosted trees," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 115-127.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:9:y:2021:i:4:p:58-:d:523212. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.