IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v56y2010i2p318-333.html
   My bibliography  Save this article

Perturbation of Numerical Confidential Data via Skew-t Distributions

Author

Listed:
  • Seokho Lee

    (Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115)

  • Marc G. Genton

    (Department of Statistics, Texas A& M University, College Station, Texas 77843)

  • Reinaldo B. Arellano-Valle

    (Departamento de Estadística, Facultad de Matemática, Pontificia Universidad Católica de Chile, Santiago 22, Chile)

Abstract

We propose a new data perturbation method for numerical database security problems based on skew-t distributions. Unlike the normal distribution, the more general class of skew-t distributions is a flexible parametric multivariate family that can model skewness and heavy tails in the data. Because databases having a normal distribution are seldom encountered in practice, the newly proposed approach, coined the skew-t data perturbation (STDP) method, is of great interest for database managers. We also discuss how to preserve the sample mean vector and sample covariance matrix exactly for any data perturbation method. We investigate the performance of the STDP method by means of a Monte Carlo simulation study and compare it with other existing perturbation methods. Of particular importance is the ability of STDP to reproduce characteristics of the joint tails of the distribution in order for database users to answer higher-level questions. We apply the STDP method to a medical database related to breast cancer.

Suggested Citation

  • Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
  • Handle: RePEc:inm:ormnsc:v:56:y:2010:i:2:p:318-333
    DOI: 10.1287/mnsc.1090.1104
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.1090.1104
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.1090.1104?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    2. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    3. Yanyuan Ma & Marc G. Genton, 2004. "Flexible Class of Skew‐Symmetric Distributions," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 31(3), pages 459-468, September.
    4. Arellano-Valle, Reinaldo B. & Bolfarine, Heleno, 1995. "On some characterizations of the t-distribution," Statistics & Probability Letters, Elsevier, vol. 25(1), pages 79-85, October.
    5. Reinaldo B. Arellano-Valle & Marc G. Genton, 2010. "Multivariate extended skew-t distributions and related families," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(3), pages 201-234.
    6. Krishnamurty Muralidhar & Dinesh Batra & Peeter J. Kirs, 1995. "Accessibility, Security, and Accuracy in Statistical Databases: The Case for the Multiplicative Fixed Data Perturbation Approach," Management Science, INFORMS, vol. 41(9), pages 1549-1564, September.
    7. M. J. R. Healy, 1968. "Multivariate Normal Plotting," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 17(2), pages 157-161, June.
    8. Robert T. Clemen & Terence Reilly, 1999. "Correlations and Copulas for Decision and Risk Analysis," Management Science, INFORMS, vol. 45(2), pages 208-224, February.
    9. Krishnamurty Muralidhar & Rathindra Sarathy, 2006. "Data Shuffling--A New Masking Approach for Numerical Data," Management Science, INFORMS, vol. 52(5), pages 658-670, May.
    10. Adelchi Azzalini & Marc G. Genton, 2008. "Robust Likelihood Methods Based on the Skew‐t and Related Distributions," International Statistical Review, International Statistical Institute, vol. 76(1), pages 106-129, April.
    11. Reinaldo B. Arellano‐Valle & Adelchi Azzalini, 2006. "On the Unification of Families of Skew‐normal Distributions," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 33(3), pages 561-574, September.
    12. A. Azzalini & A. Capitanio, 1999. "Statistical applications of the multivariate skew normal distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 579-602.
    13. Branco, Márcia D. & Dey, Dipak K., 2001. "A General Class of Multivariate Skew-Elliptical Distributions," Journal of Multivariate Analysis, Elsevier, vol. 79(1), pages 99-113, October.
    14. Adelchi Azzalini & Antonella Capitanio, 2003. "Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 367-389, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Azzalini, Adelchi, 2022. "An overview on the progeny of the skew-normal family— A personal perspective," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    2. Byungsoo Kim & Sangyeol Lee, 2014. "Minimum density power divergence estimator for covariance matrix based on skew $$t$$ t distribution," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 23(4), pages 565-575, November.
    3. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    4. Zifeng Zhao & Peng Shi & Xiaoping Feng, 2021. "Knowledge Learning of Insurance Risks Using Dependence Models," INFORMS Journal on Computing, INFORMS, vol. 33(3), pages 1177-1196, July.
    5. Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
    6. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    7. Trottini, Mario & Muralidhar, Krish & Sarathy, Rathindra, 2011. "Maintaining tail dependence in data shuffling using t copula," Statistics & Probability Letters, Elsevier, vol. 81(3), pages 420-428, March.
    8. Contreras-Reyes, Javier E., 2014. "Asymptotic form of the Kullback–Leibler divergence for multivariate asymmetric heavy-tailed distributions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 395(C), pages 200-208.
    9. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    10. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Azzalini, Adelchi, 2022. "An overview on the progeny of the skew-normal family— A personal perspective," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    2. Kim, Hyoung-Moon & Maadooliat, Mehdi & Arellano-Valle, Reinaldo B. & Genton, Marc G., 2016. "Skewed factor models using selection mechanisms," Journal of Multivariate Analysis, Elsevier, vol. 145(C), pages 162-177.
    3. Kim, Hyoung-Moon & Ryu, Duchwan & Mallick, Bani K. & Genton, Marc G., 2014. "Mixtures of skewed Kalman filters," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 228-251.
    4. Adcock, C.J., 2014. "Mean–variance–skewness efficient surfaces, Stein’s lemma and the multivariate extended skew-Student distribution," European Journal of Operational Research, Elsevier, vol. 234(2), pages 392-401.
    5. Kim, Hyoung-Moon & Genton, Marc G., 2011. "Characteristic functions of scale mixtures of multivariate skew-normal distributions," Journal of Multivariate Analysis, Elsevier, vol. 102(7), pages 1105-1117, August.
    6. Arellano-Valle, Reinaldo B. & Ferreira, Clécio S. & Genton, Marc G., 2018. "Scale and shape mixtures of multivariate skew-normal distributions," Journal of Multivariate Analysis, Elsevier, vol. 166(C), pages 98-110.
    7. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    8. Tsung-I Lin & Pal Wu & Geoffrey McLachlan & Sharon Lee, 2015. "A robust factor analysis model using the restricted skew- $$t$$ t distribution," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 510-531, September.
    9. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    10. Reinaldo B. Arellano-Valle & Marc G. Genton, 2010. "Multivariate extended skew-t distributions and related families," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(3), pages 201-234.
    11. Cornelis J. Potgieter & Marc G. Genton, 2013. "Characteristic Function-based Semiparametric Inference for Skew-symmetric Models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 40(3), pages 471-490, September.
    12. Reinaldo B. Arellano-Valle, 2010. "On the information matrix of the multivariate skew-t model," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(3), pages 371-386.
    13. Arellano-Valle, Reinaldo B. & Genton, Marc G. & Loschi, Rosangela H., 2009. "Shape mixtures of multivariate skew-normal distributions," Journal of Multivariate Analysis, Elsevier, vol. 100(1), pages 91-101, January.
    14. Zinoviy Landsman & Udi Makov & Tomer Shushi, 2017. "Extended Generalized Skew-Elliptical Distributions and their Moments," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 79(1), pages 76-100, February.
    15. Sharon Lee & Geoffrey McLachlan, 2013. "On mixtures of skew normal and skew $$t$$ -distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 241-266, September.
    16. Arellano-Valle, Reinaldo B. & Azzalini, Adelchi, 2013. "The centred parameterization and related quantities of the skew-t distribution," Journal of Multivariate Analysis, Elsevier, vol. 113(C), pages 73-90.
    17. Antonio Canale & Euloge Clovis Kenne Pagui & Bruno Scarpa, 2016. "Bayesian modeling of university first-year students' grades after placement test," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(16), pages 3015-3029, December.
    18. C. Adcock, 2010. "Asset pricing and portfolio selection based on the multivariate extended skew-Student-t distribution," Annals of Operations Research, Springer, vol. 176(1), pages 221-234, April.
    19. Hok Shing Kwong & Saralees Nadarajah, 2022. "A New Robust Class of Skew Elliptical Distributions," Methodology and Computing in Applied Probability, Springer, vol. 24(3), pages 1669-1691, September.
    20. Giorgi, Emanuele & McNeil, Alexander J., 2016. "On the computation of multivariate scenario sets for the skew-t and generalized hyperbolic families," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 205-220.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:56:y:2010:i:2:p:318-333. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.