Evaluating the cost of simplicity in score building: An example from alcohol research

Evaluating the cost of simplicity in score building: An example from alcohol research

Author

Listed:

Valentin Rousson
Bastien Trächsel
Katia Iglesias
Stéphanie Baggio

Abstract

Building a score from a questionnaire to predict a binary gold standard is a common research question in psychology and health sciences. When building this score, researchers may have to choose between statistical performance and simplicity. A practical question is to what extent it is worth sacrificing the former to improve the latter. We investigated this research question using real data, in which the aim was to predict an alcohol use disorder (AUD) diagnosis from 20 self-reported binary questions in young Swiss men (n = 233, mean age = 26). We compared the statistical performance using the area under the ROC curve (AUC) of (a) a “refined score” obtained by logistic regression and several simplified versions of it (“simple scores”): with (b) 3, (c) 2, and (d) 1 digit(s), and (e) a “sum score” that did not allow negative coefficients. We used four estimation methods: (a) maximum likelihood, (b) backward selection, (c) LASSO, and (d) ridge penalty. We also used bootstrap procedures to correct for optimism. Simple scores, especially sum scores, performed almost identically or even slightly better than the refined score (respective ranges of corrected AUCs for refined and sum scores: 0.828–0.848, 0.835–0.850), with the best performance been achieved by LASSO. Our example data demonstrated that simplifying a score to predict a binary outcome does not necessarily imply a major loss in statistical performance, while it may improve its implementation, interpretation, and acceptability. Our study thus provides further empirical evidence of the potential benefits of using sum scores in psychology and health sciences.

Suggested Citation

Valentin Rousson & Bastien Trächsel & Katia Iglesias & Stéphanie Baggio, 2023. "Evaluating the cost of simplicity in score building: An example from alcohol research," PLOS ONE, Public Library of Science, vol. 18(11), pages 1-10, November.

Handle: RePEc:plo:pone00:0294671
DOI: 10.1371/journal.pone.0294671

Download full text from publisher

References listed on IDEAS

S. K. Vines, 2000. "Simple principal components," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 49(4), pages 441-451.
Valentin Rousson & Theo Gasser, 2004. "Simple component analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 53(4), pages 539-555, November.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Trendafilov, Nickolay T. & Vines, Karen, 2009. "Simple and interpretable discrimination," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 979-989, February.
Sabatier, Robert & Reynès, Christelle, 2008. "Extensions of simple component analysis and simple linear discriminant analysis using genetic algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4779-4789, June.
Nickolay Trendafilov, 2014. "From simple structure to sparse components: a review," Computational Statistics, Springer, vol. 29(3), pages 431-454, June.
Antonello D’Ambra & Pietro Amenta, 2023. "An extension of correspondence analysis based on the multiple Taguchi’s index to evaluate the relationships between three categorical variables graphically: an application to the Italian football championship," Annals of Operations Research, Springer, vol. 325(1), pages 219-244, June.
Kim, Hyun Hak & Swanson, Norman R., 2018. "Mining big data using parsimonious factor, machine learning, variable selection and shrinkage methods," International Journal of Forecasting, Elsevier, vol. 34(2), pages 339-354.
Norman R. Swanson, 2016. "Comment," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(3), pages 348-353, July.
Luca Scrucca, 2006. "Subset selection in dimension reduction methods," Quaderni del Dipartimento di Economia, Finanza e Statistica 23/2006, Università di Perugia, Dipartimento Economia.
Ronald Gunderson & Pin Ng, 2006. "Summarizing the Effect of a Wide Array of Amenity Measures into Simple Components," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 79(2), pages 313-335, November.
Daniel Velez & Jorge Sueiras & Alejandro Ortega & Jose F. Velez, 2016. "A method for K-Means seeds generation applied to text mining," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(3), pages 477-499, August.
E. Raffinetti & I. Romeo, 2015. "Dealing with the biased effects issue when handling huge datasets: the case of INVALSI data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(12), pages 2554-2570, December.
Hugh Chipman & Hong Gu, 2005. "Interpretable dimension reduction," Journal of Applied Statistics, Taylor & Francis Journals, vol. 32(9), pages 969-987.
Park, Juhyun & Gasser, Theo & Rousson, Valentin, 2009. "Structural components in functional data," Computational Statistics & Data Analysis, Elsevier, vol. 53(9), pages 3452-3465, July.
Choulakian, V. & Allard, J. & Almhana, J., 2006. "Robust centroid method," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 737-746, November.
Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.
Jolliffe, Ian, 2022. "A 50-year personal journey through time with principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
T. F. Cox & D. S. Arnold, 2018. "Simple components," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(1), pages 83-99, January.
José Fernando Romero Cañizares & Purificación Vicente Galindo & Yannis Phillis & Evangelos Grigoroudis, 2022. "Graphical sustainability analysis using disjoint biplots," Operational Research, Springer, vol. 22(2), pages 1575-1596, April.
Carlo Cavicchia & Maurizio Vichi & Giorgia Zaccaria, 2023. "Hierarchical disjoint principal component analysis," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(3), pages 537-574, September.
Doyo Enki & Nickolay Trendafilov, 2012. "Sparse principal components by semi-partition clustering," Computational Statistics, Springer, vol. 27(4), pages 605-626, December.
Hyun Hak Kim & Norman Swanson, 2013. "Mining Big Data Using Parsimonious Factor and Shrinkage Methods," Departmental Working Papers 201316, Rutgers University, Department of Economics.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0294671. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Evaluating the cost of simplicity in score building: An example from alcohol research

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data