Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response

My bibliography Save this article

Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response

Author

Listed:

Junyung Ji
(Department of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of Korea
These authors contributed equally to this work.)
Jiwoo Kim
(Department of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of Korea
These authors contributed equally to this work.)
Younghoon Kim
(Department of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of Korea)

Registered:

Abstract

Survey data play a crucial role in various research fields, including economics, education, and healthcare, by providing insights into human behavior and opinions. However, item non-response, where respondents fail to answer specific questions, presents a significant challenge by creating incomplete datasets that undermine data integrity and can hinder or even prevent accurate analysis. Traditional methods for addressing missing data, such as statistical imputation techniques and deep learning models, often fall short when dealing with the rich linguistic content of survey data. These approaches are also hampered by high time complexity for training and the need for extensive preprocessing or feature selection. In this paper, we introduce an approach that leverages Large Language Models (LLMs) through prompt engineering for predicting item non-responses in survey data. Our method combines the strengths of both traditional imputation techniques and deep learning methods with the advanced linguistic understanding of LLMs. By integrating respondent similarities, question relevance, and linguistic semantics, our approach enhances the accuracy and comprehensiveness of survey data analysis. The proposed method bypasses the need for complex preprocessing and additional training, making it adaptable, scalable, and capable of generating explainable predictions in natural language. We evaluated the effectiveness of our LLM-based approach through a series of experiments, demonstrating its competitive performance against established methods such as Multivariate Imputation by Chained Equations (MICE), MissForest, and deep learning models like TabTransformer. The results show that our approach not only matches but, in some cases, exceeds the performance of these methods while significantly reducing the time required for data processing.

Suggested Citation

Junyung Ji & Jiwoo Kim & Younghoon Kim, 2024. "Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response," Future Internet, MDPI, vol. 16(10), pages 1-19, September.

Handle: RePEc:gam:jftint:v:16:y:2024:i:10:p:351-:d:1487251

Download full text from publisher

References listed on IDEAS

van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

repec:osf:osfxxx:udz28_v2 is not listed on IDEAS

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
Schoemaker, Nikita K. & Juffer, Femmie & Rippe, Ralph C.A. & Vermeer, Harriet J. & Stoltenborgh, Marije & Jagersma, Gabrine J. & Maras, Athanasios & Alink, Lenneke R.A., 2020. "Positive parenting in foster care: Testing the effectiveness of a video-feedback intervention program on foster parents’ behavior and attitudes," Children and Youth Services Review, Elsevier, vol. 110(C).
Nicklas Pettersson, 2013. "Bias reduction of finite population imputation by kernel methods," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 14(1), pages 139-160, March.
Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
Thelma Dede Baddoo & Zhijia Li & Samuel Nii Odai & Kenneth Rodolphe Chabi Boni & Isaac Kwesi Nooni & Samuel Ato Andam-Akorful, 2021. "Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation," IJERPH, MDPI, vol. 18(16), pages 1-26, August.
Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
Mathur, Maya B & Shpitser, Ilya, 2024. "Pitfalls of imputing using incomplete auxiliary variables," OSF Preprints c3zrh, Center for Open Science.
Andreas Reitz, 2025. "Pursuing weak paths: a performance evaluation of multiple imputation in PLS," Quality & Quantity: International Journal of Methodology, Springer, vol. 59(4), pages 3377-3403, August.
Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
World Bank & Organisation for Economic Co-operation and Development, 2017. "A Step Ahead," World Bank Publications - Books, The World Bank Group, number 27527.
Maria Lucia Parrella & Giuseppina Albano & Michele La Rocca & Cira Perna, 2019. "Reconstructing missing data sequences in multivariate time series: an application to environmental data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(2), pages 359-383, June.
Nadia B. Mendoza & Chii-Dean Lin & Susan M. Kiene & Nicolas A. Menzies & Rhoda K. Wanyenze & Katherine A. Schmarje & Rose Naigino & Michael Ediau & Seth C. Kalichman & Barbara A. Bailey, 2024. "Evaluating Imputation Methods to Improve Prediction Accuracy for an HIV Study in Uganda," Stats, MDPI, vol. 7(4), pages 1-16, November.
Christian Kurniawan & Xiyu Deng & Adhiraj Chakraborty & Assane Gueye & Niangjun Chen & Yorie Nakahira, 2022. "A Learning and Control Perspective for Microfinance," Papers 2207.12631, arXiv.org, revised Dec 2022.
Taesung Kim & Jinhee Kim & Wonho Yang & Hunjoo Lee & Jaegul Choo, 2021. "Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks," IJERPH, MDPI, vol. 18(22), pages 1-8, November.
Wan-Lun Wang & Victor Hugo Lachos & Yu-Chien Chen & Tsung-I Lin, 2025. "Flexible clustering via Gaussian parsimonious mixture models with censored and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 34(2), pages 431-458, June.
Koji Uchiyama & Yasuo Haruyama & Hiromi Shiraishi & Kiyohiko Katahira & Daiki Abukawa & Takashi Ishige & Hitoshi Tajiri & Keiichi Uchida & Kan Uchiyama & Masakazu Washio & Erika Kobashi & Atsuko Maeka, 2020. "Association between Passive Smoking from the Mother and Pediatric Crohn’s Disease: A Japanese Multicenter Study," IJERPH, MDPI, vol. 17(8), pages 1-10, April.
Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
Abhilash Bandam & Eedris Busari & Chloi Syranidou & Jochen Linssen & Detlef Stolten, 2022. "Classification of Building Types in Germany: A Data-Driven Modeling Approach," Data, MDPI, vol. 7(4), pages 1-23, April.
Boonstra Philip S. & Little Roderick J.A. & West Brady T. & Andridge Rebecca R. & Alvarado-Leiton Fernanda, 2021. "A Simulation Study of Diagnostics for Selection Bias," Journal of Official Statistics, Sciendo, vol. 37(3), pages 751-769, September.

More about this item

Keywords

; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:16:y:2024:i:10:p:351-:d:1487251. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data