IDEAS home Printed from https://ideas.repec.org/a/spr/custns/v8y2021i3d10.1007_s40547-021-00116-x.html
   My bibliography  Save this article

A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content

Author

Listed:
  • Matthew J. Schneider

    (Drexel University)

  • Shawn Mankad

    (Cornell University)

Abstract

User-generated content (UGC) is an important source of information on products and services for consumers and firms. Although incentivizing high-quality UGC is an important business objective for any content platform, we show that it is also possible to identify anonymous posters by exploiting the characteristics of posted content. We present a novel two-stage authorship attribution methodology that combines structured and text data by identifying an author first by the amount and granularity of structured data (e.g., location, first name) posted with the UGC and second by the author’s writing style. As a case study, we show that 75% of the 1.3 million users in data publicly released by Yelp are uniquely identified by three structured variable combinations. For the remaining 25%, when the number of potential authors with (nearly) identically structured data ranges from 100 to 5 and sufficient training data exists for text analysis, the average probabilities of identification range from 40 to 81%. Our findings suggest that UGC platforms concerned with the potential negative effects of privacy-related incidents should limit or generalize their posters’ structured data when it is adjoined with textual content or mentioned in the text itself. We also show that although protection policies that focus on structured data remove the most predictive elements of authorship, they also have a small negative effect on the usefulness of content.

Suggested Citation

  • Matthew J. Schneider & Shawn Mankad, 2021. "A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 8(3), pages 66-83, September.
  • Handle: RePEc:spr:custns:v:8:y:2021:i:3:d:10.1007_s40547-021-00116-x
    DOI: 10.1007/s40547-021-00116-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40547-021-00116-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40547-021-00116-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. V. Kumar & Werner Reinartz, 2018. "Customer Privacy Concerns and Privacy Protective Responses," Springer Texts in Business and Economics, in: Customer Relationship Management, edition 3, chapter 14, pages 285-309, Springer.
    2. Kelly D. Martin & Patrick E. Murphy, 2017. "The role of data privacy in marketing," Journal of the Academy of Marketing Science, Springer, vol. 45(2), pages 135-155, March.
    3. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    4. Jean-Charles Rochet & Jean Tirole, 2003. "Platform Competition in Two-Sided Markets," Journal of the European Economic Association, MIT Press, vol. 1(4), pages 990-1029, June.
    5. V. Kumar & Werner Reinartz, 2018. "Customer Relationship Management," Springer Texts in Business and Economics, Springer, edition 3, number 978-3-662-55381-7, September.
    6. Moshe Koppel & Jonathan Schler & Shlomo Argamon, 2009. "Computational methods in authorship attribution," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(1), pages 9-26, January.
    7. Yi Zhao & Sha Yang & Vishal Narayan & Ying Zhao, 2013. "Modeling Consumer Learning from Online Product Reviews," Marketing Science, INFORMS, vol. 32(1), pages 153-169, May.
    8. Matthew J. Schneider & Sharan Jagpal & Sachin Gupta & Shaobo Li & Yan Yu, 2018. "A Flexible Method for Protecting Marketing Data: An Application to Point-of-Sale Data," Marketing Science, INFORMS, vol. 37(1), pages 153-171, January.
    9. Davide Proserpio & Georgios Zervas, 2017. "Online Reputation Management: Estimating the Impact of Management Responses on Consumer Reviews," Marketing Science, INFORMS, vol. 36(5), pages 645-665, September.
    10. Wendy W. Moe & David A. Schweidel, 2012. "Online Product Opinions: Incidence, Evaluation, and Evolution," Marketing Science, INFORMS, vol. 31(3), pages 372-386, May.
    11. Shawn Mankad & Hyunjeong Spring Han & Joel Goh & Srinagesh Gavirneni, 2016. "Understanding Online Hotel Reviews Through Automated Text Analysis," Post-Print hal-02311939, HAL.
    12. Schneider, Matthew J. & Jagpal, Sharan & Gupta, Sachin & Li, Shaobo & Yu, Yan, 2017. "Protecting customer privacy when marketing with second-party data," International Journal of Research in Marketing, Elsevier, vol. 34(3), pages 593-603.
    13. Jie Xu & Min Ding, 2019. "Using the Double Transparency of Autonomous Vehicles to Increase Fairness and Social Welfare," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 6(1), pages 26-35, June.
    14. Quentin André & Ziv Carmon & Klaus Wertenbroch & Alia Crum & Douglas Frank & William Goldstein & Joel Huber & Leaf Boven & Bernd Weber & Haiyang Yang, 2018. "Consumer Choice and Autonomy in the Age of Artificial Intelligence and Big Data," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 5(1), pages 28-37, March.
    15. Zhang, Yuchi & Moe, Wendy W. & Schweidel, David A., 2017. "Modeling the role of message content and influencers in social media rebroadcasting," International Journal of Research in Marketing, Elsevier, vol. 34(1), pages 100-119.
    16. James Campbell & Avi Goldfarb & Catherine Tucker, 2015. "Privacy Regulation and Market Structure," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 24(1), pages 47-73, March.
    17. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    18. Efstathios Stamatatos, 2009. "A survey of modern authorship attribution methods," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(3), pages 538-556, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
    2. Carlson, Keith & Kopalle, Praveen K. & Riddell, Allen & Rockmore, Daniel & Vana, Prasad, 2023. "Complementing human effort in online reviews: A deep learning approach to automatic content generation and review synthesis," International Journal of Research in Marketing, Elsevier, vol. 40(1), pages 54-74.
    3. Wieringa, Jaap & Kannan, P.K. & Ma, Xiao & Reutterer, Thomas & Risselada, Hans & Skiera, Bernd, 2021. "Data analytics in a privacy-concerned world," Journal of Business Research, Elsevier, vol. 122(C), pages 915-925.
    4. Chan, Haksin & Yang, Morgan X. & Zeng, Kevin J., 2022. "Bolstering ratings and reviews systems on multi-sided platforms: A co-creation perspective," Journal of Business Research, Elsevier, vol. 139(C), pages 208-217.
    5. Hülya Karaman, 2021. "Online Review Solicitations Reduce Extremity Bias in Online Review Distributions and Increase Their Representativeness," Management Science, INFORMS, vol. 67(7), pages 4420-4445, July.
    6. Dominik Gutt & Jürgen Neumann & Steffen Zimmermann & Dennis Kundisch & Jianqing Chen, 2018. "Design of Review Systems - A Strategic Instrument to shape Online Review Behavior and Economic Outcomes," Working Papers Dissertations 42, Paderborn University, Faculty of Business Administration and Economics.
    7. Erfan Rezvani & Christian Rojas, 2022. "Firm responsiveness to consumers' reviews: The effect on online reputation," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 31(4), pages 898-922, November.
    8. Lena Steinhoff & Denni Arli & Scott Weaven & Irina V. Kozlenkova, 2019. "Online relationship marketing," Journal of the Academy of Marketing Science, Springer, vol. 47(3), pages 369-393, May.
    9. Budzinski, Oliver, 2016. "Aktuelle Herausforderungen der Wettbewerbspolitik durch Marktplätze im Internet," Ilmenau Economics Discussion Papers 103, Ilmenau University of Technology, Institute of Economics.
    10. Ana Babić Rosario & Kristine Valck & Francesca Sotgiu, 2020. "Conceptualizing the electronic word-of-mouth process: What we know and need to know about eWOM creation, exposure, and evaluation," Journal of the Academy of Marketing Science, Springer, vol. 48(3), pages 422-448, May.
    11. Guha, Abhijit & Grewal, Dhruv & Kopalle, Praveen K. & Haenlein, Michael & Schneider, Matthew J. & Jung, Hyunseok & Moustafa, Rida & Hegde, Dinesh R. & Hawkins, Gary, 2021. "How artificial intelligence will affect the future of retailing," Journal of Retailing, Elsevier, vol. 97(1), pages 28-41.
    12. Bandara, Ruwan & Fernando, Mario & Akter, Shahriar, 2020. "Explicating the privacy paradox: A qualitative inquiry of online shopping consumers," Journal of Retailing and Consumer Services, Elsevier, vol. 52(C).
    13. Ao Shen & Peng Wang & Yongyuan Ma, 2022. "When crowding‐in and when crowding‐out? The boundary conditions on the relationship between negative online reviews and online sales," Managerial and Decision Economics, John Wiley & Sons, Ltd., vol. 43(6), pages 2016-2032, September.
    14. Matthew J. Schneider & Dawn Iacobucci, 2020. "Protecting survey data on a consumer level," Journal of Marketing Analytics, Palgrave Macmillan, vol. 8(1), pages 3-17, March.
    15. Yusong Wang & David Bell, 2015. "Consumer store choice in Asian markets," Marketing Letters, Springer, vol. 26(3), pages 293-308, September.
    16. Bleier, Alexander & Goldfarb, Avi & Tucker, Catherine, 2020. "Consumer privacy and the future of data-based innovation and marketing," International Journal of Research in Marketing, Elsevier, vol. 37(3), pages 466-480.
    17. Yuchi Zhang & David Godes, 2018. "Learning from Online Social Ties," Marketing Science, INFORMS, vol. 37(3), pages 425-444, May.
    18. Kargin, Vladislav, 2016. "On variation of word frequencies in Russian literary texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 445(C), pages 328-334.
    19. Jiang, Guoyin & Shang, Jennifer & Liu, Wenping & Feng, Xiaodong & Lei, Junli, 2020. "Modeling the dynamics of online review life cycle: Role of social and economic moderations," European Journal of Operational Research, Elsevier, vol. 285(1), pages 360-379.
    20. Zhao, Yan & Wen, Lingling & Feng, Xiangnan & Li, Ran & Lin, Xiaolin, 2020. "How managerial responses to online reviews affect customer satisfaction: An empirical study based on additional reviews," Journal of Retailing and Consumer Services, Elsevier, vol. 57(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:custns:v:8:y:2021:i:3:d:10.1007_s40547-021-00116-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.