IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v176y2022ics0167947322001414.html
   My bibliography  Save this article

Vine copula statistical disclosure control for mixed-type data

Author

Listed:
  • Chu, Amanda M.Y.
  • Ip, Chun Yin
  • Lam, Benson S.Y.
  • So, Mike K.P.

Abstract

In this paper, we develop a new statistical disclosure control (SDC) method for mixed-type data based on vine copulas. The use of Gaussian and skew-t copulas has been demonstrated to be capable of incorporating information from the marginal distributions of mixed-type variables, whether they are discrete or continuous. In particular, our proposed SDC method using vine copulas generalizes a data perturbation method using an extended skew-t copula. Our vine-SDC method improves the SDC method using the extended skew-t copula by allowing the bivariate copulas in the vine decomposition to take various forms, thus offering a better fit for the joint distribution of the data and more flexibility in data perturbation. An additional advantage of our vine-SDC method is the significant improvement in computational efficiency compared with that using the extended skew-t copula. We discuss some statistical properties of vine copulas and the methodology of vine-SDC. A simulation and a study of real healthcare survey data are provided to explore the performance and strength of vine-SDC and compare it with a common copula-based SDC method.

Suggested Citation

  • Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
  • Handle: RePEc:eee:csdana:v:176:y:2022:i:c:s0167947322001414
    DOI: 10.1016/j.csda.2022.107561
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322001414
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107561?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    2. Krishnamurty Muralidhar & Dinesh Batra & Peeter J. Kirs, 1995. "Accessibility, Security, and Accuracy in Statistical Databases: The Case for the Multiplicative Fixed Data Perturbation Approach," Management Science, INFORMS, vol. 41(9), pages 1549-1564, September.
    3. Trottini, Mario & Muralidhar, Krish & Sarathy, Rathindra, 2011. "Maintaining tail dependence in data shuffling using t copula," Statistics & Probability Letters, Elsevier, vol. 81(3), pages 420-428, March.
    4. Dißmann, J. & Brechmann, E.C. & Czado, C. & Kurowicka, D., 2013. "Selecting and estimating regular vine copulae and application to financial returns," Computational Statistics & Data Analysis, Elsevier, vol. 59(C), pages 52-69.
    5. Krishnamurty Muralidhar & Rathindra Sarathy, 2006. "Data Shuffling--A New Masking Approach for Numerical Data," Management Science, INFORMS, vol. 52(5), pages 658-670, May.
    6. Brechmann, Eike C. & Joe, Harry, 2015. "Truncation of vine copulas using fit indices," Journal of Multivariate Analysis, Elsevier, vol. 138(C), pages 19-33.
    7. Adelchi Azzalini & Antonella Capitanio, 2003. "Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 367-389, May.
    8. Aas, Kjersti & Czado, Claudia & Frigessi, Arnoldo & Bakken, Henrik, 2009. "Pair-copula constructions of multiple dependence," Insurance: Mathematics and Economics, Elsevier, vol. 44(2), pages 182-198, April.
    9. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    10. Satkartar K. Kinney & Jerome P. Reiter & Arnold P. Reznek & Javier Miranda & Ron S. Jarmin & John M. Abowd, 2011. "Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database," International Statistical Review, International Statistical Institute, vol. 79(3), pages 362-384, December.
    11. Jerome P. Reiter, 2005. "Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 168(1), pages 185-205, January.
    12. Stöber, Jakob & Hong, Hyokyoung Grace & Czado, Claudia & Ghosh, Pulak, 2015. "Comorbidity of chronic diseases in the elderly: Patterns identified by a copula design for mixed responses," Computational Statistics & Data Analysis, Elsevier, vol. 88(C), pages 28-39.
    13. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    14. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
    15. Matthias Killiches & Claudia Czado, 2018. "A D‐vine copula‐based model for repeated measurements extending linear mixed models with homogeneous correlation structure," Biometrics, The International Biometric Society, vol. 74(3), pages 997-1005, September.
    16. So, Mike K.P. & Yeung, Cherry Y.T., 2014. "Vine-copula GARCH model with dynamic conditional dependence," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 655-671.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    2. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    3. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    4. Roger M. Cooke & Harry Joe & Bo Chang, 2020. "Vine copula regression for observational studies," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(2), pages 141-167, June.
    5. Genest Christian & Scherer Matthias, 2019. "The world of vines: An interview with Claudia Czado," Dependence Modeling, De Gruyter, vol. 7(1), pages 169-180, January.
    6. Hobæk Haff, Ingrid & Aas, Kjersti & Frigessi, Arnoldo & Lacal, Virginia, 2016. "Structure learning in Bayesian Networks using regular vines," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 186-208.
    7. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    8. Kjersti Aas, 2016. "Pair-Copula Constructions for Financial Applications: A Review," Econometrics, MDPI, vol. 4(4), pages 1-15, October.
    9. Trottini, Mario & Muralidhar, Krish & Sarathy, Rathindra, 2011. "Maintaining tail dependence in data shuffling using t copula," Statistics & Probability Letters, Elsevier, vol. 81(3), pages 420-428, March.
    10. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
    11. Chang, Bo & Joe, Harry, 2019. "Prediction based on conditional distributions of vine copulas," Computational Statistics & Data Analysis, Elsevier, vol. 139(C), pages 45-63.
    12. Zhang, Bangzheng & Wei, Yu & Yu, Jiang & Lai, Xiaodong & Peng, Zhenfeng, 2014. "Forecasting VaR and ES of stock index portfolio: A Vine copula method," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 416(C), pages 112-124.
    13. Panagiotelis, Anastasios & Czado, Claudia & Joe, Harry & Stöber, Jakob, 2017. "Model selection for discrete regular vine copulas," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 138-152.
    14. Zhou, Rui & Ji, Min, 2021. "Modelling mortality dependence: An application of dynamic vine copula," Insurance: Mathematics and Economics, Elsevier, vol. 99(C), pages 241-255.
    15. Han, Xuyuan & Liu, Zhenya & Wang, Shixuan, 2022. "An R-vine copula analysis of non-ferrous metal futures with application in Value-at-Risk forecasting," Journal of Commodity Markets, Elsevier, vol. 25(C).
    16. Mejdoub, Hanène & Ben Arab, Mounira, 2018. "Impact of dependence modeling of non-life insurance risks on capital requirement: D-Vine Copula approach," Research in International Business and Finance, Elsevier, vol. 45(C), pages 208-218.
    17. Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
    18. Huang, Wanling & Mollick, André Varella & Nguyen, Khoa Huu, 2016. "U.S. stock markets and the role of real interest rates," The Quarterly Review of Economics and Finance, Elsevier, vol. 59(C), pages 231-242.
    19. Han, Yingwei & Li, Jie, 2022. "Should investors include green bonds in their portfolios? Evidence for the USA and Europe," International Review of Financial Analysis, Elsevier, vol. 80(C).
    20. Acar, Elif F. & Czado, Claudia & Lysy, Martin, 2019. "Flexible dynamic vine copula models for multivariate time series data," Econometrics and Statistics, Elsevier, vol. 12(C), pages 181-197.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:176:y:2022:i:c:s0167947322001414. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.