IDEAS home Printed from https://ideas.repec.org/a/spr/custns/v8y2021i3d10.1007_s40547-021-00116-x.html
   My bibliography  Save this article

A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content

Author

Listed:
  • Matthew J. Schneider

    (Drexel University)

  • Shawn Mankad

    (Cornell University)

Abstract

User-generated content (UGC) is an important source of information on products and services for consumers and firms. Although incentivizing high-quality UGC is an important business objective for any content platform, we show that it is also possible to identify anonymous posters by exploiting the characteristics of posted content. We present a novel two-stage authorship attribution methodology that combines structured and text data by identifying an author first by the amount and granularity of structured data (e.g., location, first name) posted with the UGC and second by the author’s writing style. As a case study, we show that 75% of the 1.3 million users in data publicly released by Yelp are uniquely identified by three structured variable combinations. For the remaining 25%, when the number of potential authors with (nearly) identically structured data ranges from 100 to 5 and sufficient training data exists for text analysis, the average probabilities of identification range from 40 to 81%. Our findings suggest that UGC platforms concerned with the potential negative effects of privacy-related incidents should limit or generalize their posters’ structured data when it is adjoined with textual content or mentioned in the text itself. We also show that although protection policies that focus on structured data remove the most predictive elements of authorship, they also have a small negative effect on the usefulness of content.

Suggested Citation

  • Matthew J. Schneider & Shawn Mankad, 2021. "A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 8(3), pages 66-83, September.
  • Handle: RePEc:spr:custns:v:8:y:2021:i:3:d:10.1007_s40547-021-00116-x
    DOI: 10.1007/s40547-021-00116-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40547-021-00116-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40547-021-00116-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. V. Kumar & Werner Reinartz, 2018. "Customer Privacy Concerns and Privacy Protective Responses," Springer Texts in Business and Economics, in: Customer Relationship Management, edition 3, chapter 14, pages 285-309, Springer.
    2. Kelly D. Martin & Patrick E. Murphy, 2017. "The role of data privacy in marketing," Journal of the Academy of Marketing Science, Springer, vol. 45(2), pages 135-155, March.
    3. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    4. Jean-Charles Rochet & Jean Tirole, 2003. "Platform Competition in Two-Sided Markets," Journal of the European Economic Association, MIT Press, vol. 1(4), pages 990-1029, June.
    5. V. Kumar & Werner Reinartz, 2018. "Customer Relationship Management," Springer Texts in Business and Economics, Springer, edition 3, number 978-3-662-55381-7, June.
    6. Moshe Koppel & Jonathan Schler & Shlomo Argamon, 2009. "Computational methods in authorship attribution," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(1), pages 9-26, January.
    7. Yi Zhao & Sha Yang & Vishal Narayan & Ying Zhao, 2013. "Modeling Consumer Learning from Online Product Reviews," Marketing Science, INFORMS, vol. 32(1), pages 153-169, May.
    8. Matthew J. Schneider & Sharan Jagpal & Sachin Gupta & Shaobo Li & Yan Yu, 2018. "A Flexible Method for Protecting Marketing Data: An Application to Point-of-Sale Data," Marketing Science, INFORMS, vol. 37(1), pages 153-171, January.
    9. Davide Proserpio & Georgios Zervas, 2017. "Online Reputation Management: Estimating the Impact of Management Responses on Consumer Reviews," Marketing Science, INFORMS, vol. 36(5), pages 645-665, September.
    10. Wendy W. Moe & David A. Schweidel, 2012. "Online Product Opinions: Incidence, Evaluation, and Evolution," Marketing Science, INFORMS, vol. 31(3), pages 372-386, May.
    11. Shawn Mankad & Hyunjeong Spring Han & Joel Goh & Srinagesh Gavirneni, 2016. "Understanding Online Hotel Reviews Through Automated Text Analysis," Post-Print hal-02311939, HAL.
    12. Schneider, Matthew J. & Jagpal, Sharan & Gupta, Sachin & Li, Shaobo & Yu, Yan, 2017. "Protecting customer privacy when marketing with second-party data," International Journal of Research in Marketing, Elsevier, vol. 34(3), pages 593-603.
    13. Jie Xu & Min Ding, 2019. "Using the Double Transparency of Autonomous Vehicles to Increase Fairness and Social Welfare," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 6(1), pages 26-35, June.
    14. Quentin André & Ziv Carmon & Klaus Wertenbroch & Alia Crum & Douglas Frank & William Goldstein & Joel Huber & Leaf Boven & Bernd Weber & Haiyang Yang, 2018. "Consumer Choice and Autonomy in the Age of Artificial Intelligence and Big Data," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 5(1), pages 28-37, March.
    15. Zhang, Yuchi & Moe, Wendy W. & Schweidel, David A., 2017. "Modeling the role of message content and influencers in social media rebroadcasting," International Journal of Research in Marketing, Elsevier, vol. 34(1), pages 100-119.
    16. James Campbell & Avi Goldfarb & Catherine Tucker, 2015. "Privacy Regulation and Market Structure," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 24(1), pages 47-73, March.
    17. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    18. Efstathios Stamatatos, 2009. "A survey of modern authorship attribution methods," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(3), pages 538-556, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
    2. Carlson, Keith & Kopalle, Praveen K. & Riddell, Allen & Rockmore, Daniel & Vana, Prasad, 2023. "Complementing human effort in online reviews: A deep learning approach to automatic content generation and review synthesis," International Journal of Research in Marketing, Elsevier, vol. 40(1), pages 54-74.
    3. Wieringa, Jaap & Kannan, P.K. & Ma, Xiao & Reutterer, Thomas & Risselada, Hans & Skiera, Bernd, 2021. "Data analytics in a privacy-concerned world," Journal of Business Research, Elsevier, vol. 122(C), pages 915-925.
    4. Lena Steinhoff & Denni Arli & Scott Weaven & Irina V. Kozlenkova, 2019. "Online relationship marketing," Journal of the Academy of Marketing Science, Springer, vol. 47(3), pages 369-393, May.
    5. Ao Shen & Peng Wang & Yongyuan Ma, 2022. "When crowding‐in and when crowding‐out? The boundary conditions on the relationship between negative online reviews and online sales," Managerial and Decision Economics, John Wiley & Sons, Ltd., vol. 43(6), pages 2016-2032, September.
    6. Matthew J. Schneider & Dawn Iacobucci, 2020. "Protecting survey data on a consumer level," Journal of Marketing Analytics, Palgrave Macmillan, vol. 8(1), pages 3-17, March.
    7. Yusong Wang & David Bell, 2015. "Consumer store choice in Asian markets," Marketing Letters, Springer, vol. 26(3), pages 293-308, September.
    8. Yuchi Zhang & David Godes, 2018. "Learning from Online Social Ties," Marketing Science, INFORMS, vol. 37(3), pages 425-444, May.
    9. Krämer, Jan & Wohlfarth, Michael, 2018. "Market power, regulatory convergence, and the role of data in digital markets," Telecommunications Policy, Elsevier, vol. 42(2), pages 154-171.
    10. Silvia Corbara & Alejandro Moreo & Fabrizio Sebastiani, 2023. "Syllabic quantity patterns as rhythmic features for Latin authorship attribution," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 128-141, January.
    11. Ming-Hui Huang & Roland T. Rust, 2021. "A strategic framework for artificial intelligence in marketing," Journal of the Academy of Marketing Science, Springer, vol. 49(1), pages 30-50, January.
    12. Morlok, Tina & Matt, Christian & Hess, Thomas, 2017. "Privatheitsforschung in den Wirtschaftswissenschaften: Entwicklung, Stand und Perspektiven," Working Papers 1/2017, University of Munich, Munich School of Management, Institute for Information Systems and New Media.
    13. Stefano Sbalchiero & Maria Stella Righettini, 2017. "Rhetorical manifestation of institutional transformation," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(3), pages 1279-1296, May.
    14. Guo, Qiaozhen & Chen, Ying-Ju & Huang, Wei, 2022. "Dynamic pricing of new experience products with dual-channel social learning and online review manipulations," Omega, Elsevier, vol. 109(C).
    15. Michael Kummer & Patrick Schulte, 2019. "When Private Information Settles the Bill: Money and Privacy in Google’s Market for Smartphone Applications," Management Science, INFORMS, vol. 65(8), pages 3470-3494, August.
    16. Ning Zhong & David A. Schweidel, 2020. "Capturing Changes in Social Media Content: A Multiple Latent Changepoint Topic Model," Marketing Science, INFORMS, vol. 39(4), pages 827-846, July.
    17. Mariia Petryk & Michael Rivera & Siddharth Bhattacharya & Liangfei Qiu & Subodha Kumar, 2022. "How Network Embeddedness Affects Real-Time Performance Feedback: An Empirical Investigation," Information Systems Research, INFORMS, vol. 33(4), pages 1467-1489, December.
    18. Hong, Ji Hyun & Kim, Byung Cho & Park, Kyung Sam, 2019. "Optimal risk management for the sharing economy with stranger danger and service quality," European Journal of Operational Research, Elsevier, vol. 279(3), pages 1024-1035.
    19. Du, Shuili & Xie, Chunyan, 2021. "Paradoxes of artificial intelligence in consumer markets: Ethical challenges and opportunities," Journal of Business Research, Elsevier, vol. 129(C), pages 961-974.
    20. Marios Kokkodis, 2021. "Dynamic, Multidimensional, and Skillset-Specific Reputation Systems for Online Work," Information Systems Research, INFORMS, vol. 32(3), pages 688-712, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:custns:v:8:y:2021:i:3:d:10.1007_s40547-021-00116-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.