IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v61y2015i3p520-541.html
   My bibliography  Save this article

Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases

Author

Listed:
  • Yi Qian

    (Department of Marketing and Behavioral Science, Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada)

  • Hui Xie

    (Division of Epidemiology and Biostatistics, University of Illinois, Chicago, Illinois 60612)

Abstract

Databases play a central role in evidence-based innovations in business, economics, social, and health sciences. In modern business and society, there are rapidly growing demands for constructing analytically valid databases that also are secure and protect sensitive information to meet customer and public expectations, to minimize financial losses, and to comply with privacy regulations and laws. We propose new data perturbation and shuffling (DPS) procedures, named MORE, for this purpose. As compared with existing DPS methods, MORE can substantially increase the utility of secure databases without increasing disclosure risk. MORE is capable of preserving important nonmonotonic relationships among attributes, such as the inverted-U relationship between competition and innovation. Maintaining such relationships is often the key to determining optimal levels of policy and managerial interventions. MORE does not require data to be of particular types or have particular distributional shapes. Instead, it provides unified, flexible, and robust algorithms to mask general types of confidential variables with arbitrary distributions, thereby making it suitable for general-purpose data masking. Since MORE nests the commonly used generalized linear models as special cases, a much wider range of statistical analyses can be conducted by using the secure databases with results similar to those achieved by using the original databases. Unlike existing DPS approaches that typically require a joint model for all variables, MORE requires no modeling of nonconfidential variables and thus further increases the robustness of secure databases. Evaluation of MORE through Monte Carlo simulation studies and empirical applications demonstrates that it performs better than existing data-masking methods.Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2014.2026 . This paper was accepted by Sandra Slaughter, information systems .

Suggested Citation

  • Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
  • Handle: RePEc:inm:ormnsc:v:61:y:2015:i:3:p:520-541
    DOI: 10.1287/mnsc.2014.2026
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2014.2026
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2014.2026?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    2. Philippe Aghion & Nick Bloom & Richard Blundell & Rachel Griffith & Peter Howitt, 2005. "Competition and Innovation: an Inverted-U Relationship," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 120(2), pages 701-728.
    3. Avi Goldfarb & Catherine Tucker, 2012. "Privacy and Innovation," NBER Chapters, in: Innovation Policy and the Economy, Volume 12, pages 65-89, National Bureau of Economic Research, Inc.
    4. Krishnamurty Muralidhar & Dinesh Batra & Peeter J. Kirs, 1995. "Accessibility, Security, and Accuracy in Statistical Databases: The Case for the Multiplicative Fixed Data Perturbation Approach," Management Science, INFORMS, vol. 41(9), pages 1549-1564, September.
    5. Carl F. Mela, 2011. "Structural Workshop Paper --Data Selection and Procurement," Marketing Science, INFORMS, vol. 30(6), pages 965-976, November.
    6. Amalia R. Miller & Catherine E. Tucker, 2011. "Encryption and the loss of patient data," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 30(3), pages 534-556, June.
    7. Genest, Christian & Nešlehová, Johanna, 2007. "A Primer on Copulas for Count Data," ASTIN Bulletin, Cambridge University Press, vol. 37(2), pages 475-515, November.
    8. Krishnamurty Muralidhar & Rathindra Sarathy, 2006. "Data Shuffling--A New Masking Approach for Numerical Data," Management Science, INFORMS, vol. 52(5), pages 658-670, May.
    9. Joakim Kalvenes & Amit Basu, 2006. "Design of Robust Business-to-Business Electronic Marketplaces with Guaranteed Privacy," Management Science, INFORMS, vol. 52(11), pages 1721-1736, November.
    10. Yi Qian & Hui Xie, 2014. "Which Brand Purchasers Are Lost to Counterfeiters? An Application of New Data Fusion Approaches," Marketing Science, INFORMS, vol. 33(3), pages 437-448, May.
    11. Hua Yun Chen, 2004. "Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 1176-1189, December.
    12. Syam Menon & Sumit Sarkar, 2007. "Minimizing Information Loss and Preserving Privacy," Management Science, INFORMS, vol. 53(1), pages 101-116, January.
    13. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    14. Reiter, Jerome P. & Raghunathan, Trivellore E., 2007. "The Multiple Adaptations of Multiple Imputation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1462-1471, December.
    15. Hua Yun Chen, 2007. "A Semiparametric Odds Ratio Model for Measuring Association," Biometrics, The International Biometric Society, vol. 63(2), pages 413-421, June.
    16. Jerome P. Reiter, 2005. "Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 168(1), pages 185-205, January.
    17. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    18. Yi Qian & Hui Xie, 2011. "No Customer Left Behind: A Distribution-Free Bayesian Approach to Accounting for Missing Xs in Marketing Models," Marketing Science, INFORMS, vol. 30(4), pages 717-736, July.
    19. Xiao-Bai Li & Sumit Sarkar, 2011. "Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data," Information Systems Research, INFORMS, vol. 22(4), pages 774-789, December.
    20. Kim, Gunky & Silvapulle, Mervyn J. & Silvapulle, Paramsothy, 2007. "Comparison of semiparametric and parametric methods for estimating copulas," Computational Statistics & Data Analysis, Elsevier, vol. 51(6), pages 2836-2850, March.
    21. Yi Qian, 2007. "Do National Patent Laws Stimulate Domestic Innovation in a Global Patenting Environment? A Cross-Country Analysis of Pharmaceutical Patent Protection, 1978-2002," The Review of Economics and Statistics, MIT Press, vol. 89(3), pages 436-453, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    2. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    3. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    4. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    5. Trottini, Mario & Muralidhar, Krish & Sarathy, Rathindra, 2011. "Maintaining tail dependence in data shuffling using t copula," Statistics & Probability Letters, Elsevier, vol. 81(3), pages 420-428, March.
    6. Woodcock, Simon D. & Benedetto, Gary, 2009. "Distribution-preserving statistical disclosure limitation," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4228-4242, October.
    7. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
    8. Yi Qian, 2014. "Counterfeiters: Foes or Friends? How Counterfeits Affect Sales by Product Quality Tier," Management Science, INFORMS, vol. 60(10), pages 2381-2400, October.
    9. Yi Qian & Hui Xie, 2022. "Simplifying Bias Correction for Selective Sampling: A Unified Distribution-Free Approach to Handling Endogenously Selected Samples," Marketing Science, INFORMS, vol. 41(2), pages 336-360, March.
    10. Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
    11. Aghion, Philippe & Akcigit, Ufuk & Howitt, Peter, 2014. "What Do We Learn From Schumpeterian Growth Theory?," Handbook of Economic Growth, in: Philippe Aghion & Steven Durlauf (ed.), Handbook of Economic Growth, edition 1, volume 2, chapter 0, pages 515-563, Elsevier.
    12. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    13. Lefouili, Yassine & Toh, Ying Lei & Madio, Leonardo, 2017. "Privacy Regulation and Quality-Enhancing Innovation," TSE Working Papers 17-795, Toulouse School of Economics (TSE), revised Jul 2023.
    14. Klein Martin & Sinha Bimal, 2013. "Statistical Analysis of Noise-Multiplied Data Using Multiple Imputation," Journal of Official Statistics, Sciendo, vol. 29(3), pages 425-465, June.
    15. Yi Qian & Hui Xie, 2014. "Which Brand Purchasers Are Lost to Counterfeiters? An Application of New Data Fusion Approaches," Marketing Science, INFORMS, vol. 33(3), pages 437-448, May.
    16. Malte Mosel, 2009. "Competition, imitation, and R&D productivity in agrowth model with sector-specific patent protection," Working Papers 084, Bavarian Graduate Program in Economics (BGPE).
    17. Dosi, Giovanni & Palagi, Elisa & Roventini, Andrea & Russo, Emanuele, 2023. "Do patents really foster innovation in the pharmaceutical sector? Results from an evolutionary, agent-based model," Journal of Economic Behavior & Organization, Elsevier, vol. 212(C), pages 564-589.
    18. Brüggemann, Julia & Crosetto, Paolo & Meub, Lukas & Bizer, Kilian, 2016. "Intellectual property rights hinder sequential innovation. Experimental evidence," Research Policy, Elsevier, vol. 45(10), pages 2054-2068.
    19. Yi Qian & Hui Xie, 2011. "No Customer Left Behind: A Distribution-Free Bayesian Approach to Accounting for Missing Xs in Marketing Models," Marketing Science, INFORMS, vol. 30(4), pages 717-736, July.
    20. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:61:y:2015:i:3:p:520-541. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.