IDEAS home Printed from https://ideas.repec.org/a/eee/jbrese/v69y2016i2p992-999.html
   My bibliography  Save this article

Increasing sample size compensates for data problems in segmentation studies

Author

Listed:
  • Dolnicar, Sara
  • Grün, Bettina
  • Leisch, Friedrich

Abstract

Survey data frequently serve as the basis for market segmentation studies. Survey data, however, are prone to a range of biases. Little is known about the effects of such biases on the quality of data-driven market segmentation solutions. This study uses artificial data sets of known structure to study the effects of data problems on segment recovery. Some of the data problems under study are partially under the control of market research companies, some are outside their control. Results indicate that (1) insufficient sample sizes lead to suboptimal segmentation solutions; (2) biases in survey data have a strong negative effect on segment recovery; (3) increasing the sample size can compensate for some biases; (4) the effect of sample size increase on segment recovery demonstrates decreasing marginal returns; and—for highly detrimental biases—(5) improvement in segment recovery at high sample size levels occurs only if additional data is free of bias.

Suggested Citation

  • Dolnicar, Sara & Grün, Bettina & Leisch, Friedrich, 2016. "Increasing sample size compensates for data problems in segmentation studies," Journal of Business Research, Elsevier, vol. 69(2), pages 992-999.
  • Handle: RePEc:eee:jbrese:v:69:y:2016:i:2:p:992-999
    DOI: 10.1016/j.jbusres.2015.09.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0148296315003926
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jbusres.2015.09.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mariorty, Rowland T. & Reibstein, David J., 1986. "Benefit segmentation in industrial markets," Journal of Business Research, Elsevier, vol. 14(6), pages 463-486, December.
    2. Geert Soete & Wayne DeSarbo & J. Carroll, 1985. "Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 173-192, December.
    3. Alvarez, Cecilia M.O. & Dickson, Peter R. & Hunter, Gary K., 2014. "The four faces of the Hispanic consumer: An acculturation-based segmentation," Journal of Business Research, Elsevier, vol. 67(2), pages 108-115.
    4. Goldsmith, Ronald E., 1988. "Spurious response error in a new product survey," Journal of Business Research, Elsevier, vol. 17(3), pages 271-281, November.
    5. Sara Dolnicar & Friedrich Leisch, 2010. "Evaluation of structure and reproducibility of cluster solutions using the bootstrap," Marketing Letters, Springer, vol. 21(1), pages 83-101, March.
    6. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    7. John R. Rossiter, 2011. "Measurement for the Social Sciences," Springer Books, Springer, number 978-1-4419-7158-6, November.
    8. Schaninger, Charles M. & Buss, W. Christian, 1986. "Removing response-style effects in attribute-determinance ratings to identify market segments," Journal of Business Research, Elsevier, vol. 14(3), pages 237-252, June.
    9. Bhatnagar, Amit & Ghose, Sanjoy, 2004. "Segmenting consumers based on the benefits and risks of Internet shopping," Journal of Business Research, Elsevier, vol. 57(12), pages 1352-1360, December.
    10. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2009. "Variable selection in model-based clustering: A general variable role modeling," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3872-3882, September.
    11. Coussement, Kristof & Van den Bossche, Filip A.M. & De Bock, Koen W., 2014. "Data accuracy's impact on segmentation performance: Benchmarking RFM analysis, logistic regression, and decision trees," Journal of Business Research, Elsevier, vol. 67(1), pages 2751-2758.
    12. Sullivan, Mary Kay & Miller, Alex, 1996. "Segmenting the informal venture capital market: Economic, hedonistic, and altruistic investors," Journal of Business Research, Elsevier, vol. 36(1), pages 25-35, May.
    13. Leisch, Friedrich, 2004. "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i08).
    14. Peterson, Robert A. & Sharpe, Louis K., 1973. "Market segmentation: Product usage patterns and psychographic configurations," Journal of Business Research, Elsevier, vol. 1(1), pages 11-20.
    15. Tellis, Gerard J. & Chandrasekaran, Deepa, 2010. "Extent and impact of response biases in cross-national survey research," International Journal of Research in Marketing, Elsevier, vol. 27(4), pages 329-341.
    16. Jarl Kampen & Marc Swyngedouw, 2000. "The Ordinal Controversy Revisited," Quality & Quantity: International Journal of Methodology, Springer, vol. 34(1), pages 87-102, February.
    17. Athanassopoulos, Antreas D., 2000. "Customer Satisfaction Cues To Support Market Segmentation and Explain Switching Behavior," Journal of Business Research, Elsevier, vol. 47(3), pages 191-207, March.
    18. Grün, Bettina & Leisch, Friedrich, 2008. "FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i04).
    19. Melnykov, Volodymyr & Chen, Wei-Chen & Maitra, Ranjan, 2012. "MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i12).
    20. Evgenia Dimitriadou & Sara Dolničar & Andreas Weingessel, 2002. "An examination of indexes for determining the number of clusters in binary data sets," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 137-159, March.
    21. Wayne DeSarbo & J. Carroll & Linda Clark & Paul Green, 1984. "Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables," Psychometrika, Springer;The Psychometric Society, vol. 49(1), pages 57-78, March.
    22. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    23. Wayne DeSarbo & Vijay Mahajan, 1984. "Constrained classification: The use of a priori information in cluster analysis," Psychometrika, Springer;The Psychometric Society, vol. 49(2), pages 187-215, June.
    24. Glenn Milligan, 1980. "An examination of the effect of six types of error perturbation on fifteen clustering algorithms," Psychometrika, Springer;The Psychometric Society, vol. 45(3), pages 325-342, September.
    25. Steenkamp, Jan-Benedict E. M. & Wedel, Michel, 1993. "Fuzzy clusterwise regression in benefit segmentation: Application and investigation into its validity," Journal of Business Research, Elsevier, vol. 26(3), pages 237-249, March.
    26. Raftery, Adrian E. & Dean, Nema, 2006. "Variable Selection for Model-Based Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 168-178, March.
    27. Chris Fraley & Adrian E. Raftery, 2007. "Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 24(2), pages 155-181, September.
    28. Floh, Arne & Zauner, Alexander & Koller, Monika & Rusch, Thomas, 2014. "Customer segmentation using unobserved heterogeneity in the perceived-value–loyalty–intentions link," Journal of Business Research, Elsevier, vol. 67(5), pages 974-982.
    29. Roberts, John H. & Kayande, Ujwal & Stremersch, Stefan, 2014. "From academic research to marketing practice: Exploring the marketing science value chain," International Journal of Research in Marketing, Elsevier, vol. 31(2), pages 127-140.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pallant, Jason I. & Pallant, Jessica L. & Sands, Sean J. & Ferraro, Carla R. & Afifi, Eslam, 2022. "When and how consumers are willing to exchange data with retailers: An exploratory segmentation," Journal of Retailing and Consumer Services, Elsevier, vol. 64(C).
    2. Nadeem Uz Zaman, Zainab Bibi, Sana Ur Rehman Sheikh, Abdul Raziq, 2020. "Manualizing Factor Analysis of Likert Scale Data," Journal of Management Sciences, Geist Science, Iqra University, Faculty of Business Administration, vol. 7(2), pages 56-67, October.
    3. Fernando Fonseca & Elisa Conticelli & George Papageorgiou & Paulo Ribeiro & Mona Jabbari & Simona Tondelli & Rui Ramos, 2021. "Levels and Characteristics of Utilitarian Walking in the Central Areas of the Cities of Bologna and Porto," Sustainability, MDPI, vol. 13(6), pages 1-22, March.
    4. Sunil Sahadev & Neeru Malhotra & Avinandan (Avi) Mukherjee, 2020. "Segmenting Excessive Alcohol Consumers: Implications for Social Marketing," IIM Kozhikode Society & Management Review, , vol. 9(2), pages 213-225, July.
    5. Monica Ewomazino Akokuwebe & Erhabor Sunday Idemudia, 2021. "Multilevel Analysis of Urban–Rural Variations of Body Weights and Individual-Level Factors among Women of Childbearing Age in Nigeria and South Africa: A Cross-Sectional Survey," IJERPH, MDPI, vol. 19(1), pages 1-30, December.
    6. Eric Yaw Naminse & Jincai Zhuang, 2018. "Does farmer entrepreneurship alleviate rural poverty in China? Evidence from Guangxi Province," PLOS ONE, Public Library of Science, vol. 13(3), pages 1-18, March.
    7. Pallant, Jessica & Sands, Sean & Karpen, Ingo, 2020. "Product customization: A profile of consumer demand," Journal of Retailing and Consumer Services, Elsevier, vol. 54(C).
    8. Sands, Sean & Ferraro, Carla & Campbell, Colin & Kietzmann, Jan & Andonopoulos, Vasiliki Vicki, 2020. "Who shares? Profiling consumers in the sharing economy," Australasian marketing journal, Elsevier, vol. 28(3), pages 22-33.
    9. Sands, Sean & Maggioni, Isabella & Ferraro, Carla & Jebarajakirthy, Charles & Dharmesti, Maria, 2019. "The vice and virtue of on-the-go consumption: An exploratory segmentation," Journal of Retailing and Consumer Services, Elsevier, vol. 51(C), pages 399-408.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    2. Florian Schreiber, 2017. "Identification of customer groups in the German term life market: a benefit segmentation," Annals of Operations Research, Springer, vol. 254(1), pages 365-399, July.
    3. Melnykov, Volodymyr, 2016. "Model-based biclustering of clickstream data," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 31-45.
    4. Matthieu Marbac & Mohammed Sedki & Tienne Patin, 2020. "Variable Selection for Mixed Data Clustering: Application in Human Population Genomics," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 124-142, April.
    5. Wayne S. DeSarbo & Qian Chen & Ashley Stadler Blank, 2017. "A Parametric Constrained Segmentation Methodology for Application in Sport Marketing," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 4(4), pages 37-55, December.
    6. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2011. "Variable selection in model-based discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(10), pages 1374-1387, November.
    7. Renato Cordeiro Amorim, 2016. "A Survey on Feature Weighting Based K-Means Algorithms," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 210-242, July.
    8. J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
    9. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    10. Michael Brusco & Douglas Steinley, 2007. "A Comparison of Heuristic Procedures for Minimum Within-Cluster Sums of Squares Partitioning," Psychometrika, Springer;The Psychometric Society, vol. 72(4), pages 583-600, December.
    11. Anzanello, Michel J. & Fogliatto, Flavio S., 2011. "Selecting the best clustering variables for grouping mass-customized products involving workers' learning," International Journal of Production Economics, Elsevier, vol. 130(2), pages 268-276, April.
    12. Wayne DeSarbo & Richard Oliver & Arvind Rangaswamy, 1989. "A simulated annealing methodology for clusterwise linear regression," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 707-736, September.
    13. Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2013. "Clustering and classification via cluster-weighted factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 5-40, March.
    14. Jerzy Korzeniewski, 2016. "New Method Of Variable Selection For Binary Data Cluster Analysis," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 17(2), pages 295-304, June.
    15. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    16. Efthymios Costa & Ioanna Papatsouma & Angelos Markos, 2023. "Benchmarking distance-based partitioning methods for mixed-type data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 701-724, September.
    17. Crook Oliver M. & Gatto Laurent & Kirk Paul D. W., 2019. "Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-20, December.
    18. Tsai, Chieh-Yuan & Chiu, Chuang-Cheng, 2008. "Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4658-4672, June.
    19. Monia Ranalli & Roberto Rocci, 2017. "A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 82(4), pages 1007-1034, December.
    20. Cappozzo, Andrea & Greselin, Francesca & Murphy, Thomas Brendan, 2021. "Robust variable selection for model-based learning in presence of adulteration," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jbrese:v:69:y:2016:i:2:p:992-999. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jbusres .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.