IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/19586.html
   My bibliography  Save this paper

Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases

Author

Listed:
  • Yi Qian
  • Hui Xie

Abstract

Databases play a central role in evidence-based innovations in business, economics, social, and health sciences. In modern business and society, there are rapidly growing demands for constructing analytically valid databases that also are secure and protect sensitive information in order to meet customer and public expectations, to minimize financial losses, and to comply with privacy regulations and laws. We propose new data perturbation and shuffling (DPS) procedures, named MORE, for this purpose. As compared with existing DPS methods, MORE can substantially increase the utility of secure databases without increasing disclosure risk. MORE is capable of preserving important nonmonotonic relationships among attributes, such as the inverted-U relationship between competition and innovation. Maintaining such relationships is often the key to determining optimal levels of policy and managerial interventions. MORE does not require data to be of particular types or have particular distributional shapes. Instead, it provides unified, flexible, and robust algorithms to mask general types of confidential variables with arbitrary distributions, thereby making it suitable for general-purpose data masking. Since MORE nests the commonly used generalized linear models as special cases, a much wider range of statistical analyses can be conducted using the secure databases with results similar to those using the original databases. Unlike existing DPS approaches which typically require a joint model for all variables, MORE requires no modeling of nonconfidential variables, and thus further increases the robustness of secure databases. Evaluation of MORE through Monte Carlo simulation studies and empirical applications demonstrates that it performs better than existing data masking methods.

Suggested Citation

  • Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:19586
    Note: PR
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w19586.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Avi Goldfarb & Catherine Tucker, 2012. "Privacy and Innovation," NBER Chapters, in: Innovation Policy and the Economy, Volume 12, pages 65-89, National Bureau of Economic Research, Inc.
    2. Krishnamurty Muralidhar & Dinesh Batra & Peeter J. Kirs, 1995. "Accessibility, Security, and Accuracy in Statistical Databases: The Case for the Multiplicative Fixed Data Perturbation Approach," Management Science, INFORMS, vol. 41(9), pages 1549-1564, September.
    3. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    4. Hua Yun Chen, 2007. "A Semiparametric Odds Ratio Model for Measuring Association," Biometrics, The International Biometric Society, vol. 63(2), pages 413-421, June.
    5. Philippe Aghion & Nick Bloom & Richard Blundell & Rachel Griffith & Peter Howitt, 2005. "Competition and Innovation: an Inverted-U Relationship," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 120(2), pages 701-728.
    6. Yi Qian & Hui Xie, 2011. "No Customer Left Behind: A Distribution-Free Bayesian Approach to Accounting for Missing Xs in Marketing Models," Marketing Science, INFORMS, vol. 30(4), pages 717-736, July.
    7. Xiao-Bai Li & Sumit Sarkar, 2011. "Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data," Information Systems Research, INFORMS, vol. 22(4), pages 774-789, December.
    8. Kim, Gunky & Silvapulle, Mervyn J. & Silvapulle, Paramsothy, 2007. "Comparison of semiparametric and parametric methods for estimating copulas," Computational Statistics & Data Analysis, Elsevier, vol. 51(6), pages 2836-2850, March.
    9. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    10. Amalia R. Miller & Catherine E. Tucker, 2011. "Encryption and the loss of patient data," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 30(3), pages 534-556, June.
    11. Joakim Kalvenes & Amit Basu, 2006. "Design of Robust Business-to-Business Electronic Marketplaces with Guaranteed Privacy," Management Science, INFORMS, vol. 52(11), pages 1721-1736, November.
    12. Hua Yun Chen, 2004. "Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 1176-1189, December.
    13. Syam Menon & Sumit Sarkar, 2007. "Minimizing Information Loss and Preserving Privacy," Management Science, INFORMS, vol. 53(1), pages 101-116, January.
    14. Reiter, Jerome P. & Raghunathan, Trivellore E., 2007. "The Multiple Adaptations of Multiple Imputation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1462-1471, December.
    15. Jerome P. Reiter, 2005. "Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 168(1), pages 185-205, January.
    16. Yi Qian, 2007. "Do National Patent Laws Stimulate Domestic Innovation in a Global Patenting Environment? A Cross-Country Analysis of Pharmaceutical Patent Protection, 1978-2002," The Review of Economics and Statistics, MIT Press, vol. 89(3), pages 436-453, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    2. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    3. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    4. Woodcock, Simon D. & Benedetto, Gary, 2009. "Distribution-preserving statistical disclosure limitation," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4228-4242, October.
    5. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    6. Aghion, Philippe & Akcigit, Ufuk & Howitt, Peter, 2014. "What Do We Learn From Schumpeterian Growth Theory?," Handbook of Economic Growth, in: Philippe Aghion & Steven Durlauf (ed.), Handbook of Economic Growth, edition 1, volume 2, chapter 0, pages 515-563, Elsevier.
    7. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    8. Lefouili, Yassine & Toh, Ying Lei & Madio, Leonardo, 2017. "Privacy Regulation and Quality-Enhancing Innovation," TSE Working Papers 17-795, Toulouse School of Economics (TSE), revised Jul 2023.
    9. Klein Martin & Sinha Bimal, 2013. "Statistical Analysis of Noise-Multiplied Data Using Multiple Imputation," Journal of Official Statistics, Sciendo, vol. 29(3), pages 425-465, June.
    10. Yi Qian & Hui Xie, 2014. "Which Brand Purchasers Are Lost to Counterfeiters? An Application of New Data Fusion Approaches," Marketing Science, INFORMS, vol. 33(3), pages 437-448, May.
    11. Malte Mosel, 2009. "Competition, imitation, and R&D productivity in agrowth model with sector-specific patent protection," Working Papers 084, Bavarian Graduate Program in Economics (BGPE).
    12. Dosi, Giovanni & Palagi, Elisa & Roventini, Andrea & Russo, Emanuele, 2023. "Do patents really foster innovation in the pharmaceutical sector? Results from an evolutionary, agent-based model," Journal of Economic Behavior & Organization, Elsevier, vol. 212(C), pages 564-589.
    13. Amanda M. Y. Chu & Benson S. Y. Lam & Agnes Tiwari & Mike K. P. So, 2019. "An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research," IJERPH, MDPI, vol. 16(22), pages 1-17, November.
    14. Brüggemann, Julia & Crosetto, Paolo & Meub, Lukas & Bizer, Kilian, 2016. "Intellectual property rights hinder sequential innovation. Experimental evidence," Research Policy, Elsevier, vol. 45(10), pages 2054-2068.
    15. Yi Qian & Hui Xie, 2011. "No Customer Left Behind: A Distribution-Free Bayesian Approach to Accounting for Missing Xs in Marketing Models," Marketing Science, INFORMS, vol. 30(4), pages 717-736, July.
    16. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    17. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    18. Yi Qian, 2014. "Counterfeiters: Foes or Friends? How Counterfeits Affect Sales by Product Quality Tier," Management Science, INFORMS, vol. 60(10), pages 2381-2400, October.
    19. Yilin Li & Wang Miao & Ilya Shpitser & Eric J. Tchetgen Tchetgen, 2023. "A self‐censoring model for multivariate nonignorable nonmonotone missing data," Biometrics, The International Biometric Society, vol. 79(4), pages 3203-3214, December.
    20. Rockett, Katharine, 2010. "Property Rights and Invention," Handbook of the Economics of Innovation, in: Bronwyn H. Hall & Nathan Rosenberg (ed.), Handbook of the Economics of Innovation, edition 1, volume 1, chapter 0, pages 315-380, Elsevier.

    More about this item

    JEL classification:

    • M31 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Marketing and Advertising - - - Marketing

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:19586. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.