Predicting Countriesâ€™ Development Levels Using the Decision Tree and Random Forest Methods

My bibliography Save this article

Predicting Countriesâ€™ Development Levels Using the Decision Tree and Random Forest Methods

Author

Listed:

Batuhan Ã–zkan
(YÄ±ldÄ±z Teknik Ãœniversitesi, Fen Edebiyat FakÃ¼ltesi, Ä°statistik BÃ¶lÃ¼mÃ¼, Ä°stanbul, TÃ¼rkiye.)
CoÅŸkun Parim
(YÄ±ldÄ±z Teknik Ãœniversitesi, Fen Edebiyat FakÃ¼ltesi, Ä°statistik BÃ¶lÃ¼mÃ¼, Ä°stanbul, TÃ¼rkiye.)
Erhan Ã‡ene
(YÄ±ldÄ±z Teknik Ãœniversitesi, Fen Edebiyat FakÃ¼ltesi, Ä°statistik BÃ¶lÃ¼mÃ¼, Ä°stanbul, TÃ¼rkiye.)

Registered:

Abstract

A very close relationship exists between countriesâ€™ development levels and economic level. Countries can be examined according to various criteria and evaluated under different groups based on their level of development, from underdeveloped to highly developed. Socioeconomic factors generally play a decisive role in determining countriesâ€™ levels of development. Although the level of development is determined with the help of socioeconomic variables, different organizations (e.g., United Nations [UN], International Monetary Fund [IMF]) may make country classifications with different methods. This situation causes a countryâ€™s development level to occur in different categories based on the method used and the organization that performed it. The aim of this study is to propose a machine learning model that predicts the development level for 193 countries. Development level consists of the categories of high income, upper middle income, lower middle income, and low income. The 26 variables that affect countriesâ€™ development levels were obtained from the World Development Indicators (WDI) database. Firstly, random forest based variable importance was used to determine the variables which have the most important effects on countriesâ€™ development levels. Afterwards, countriesâ€™ development levels were classified using decision trees and random forest algorithms with the most important variables selected through variable importance. The model composed with the random forest algorithm was determined to have correctly classified countriesâ€™ development levels at an accuracy of 70%. In addition, the findings show the variables of adolescent fertility rate, total fertility rate, and the share of agriculture, forestry, and fisheries in gross domestic product GDP) to be the most important variables affecting countriesâ€™ development levels.

Suggested Citation

Batuhan Ã–zkan & CoÅŸkun Parim & Erhan Ã‡ene, 2023. "Predicting Countriesâ€™ Development Levels Using the Decision Tree and Random Forest Methods," EKOIST Journal of Econometrics and Statistics, Istanbul University, Faculty of Economics, vol. 0(38), pages 87-104, June.

Handle: RePEc:ist:ekoist:v:0:y:2023:i:38:p:87-104
DOI: 10.26650/ekoist.2023.38.1172190

Download full text from publisher

References listed on IDEAS

Ariel Kleiner & Ameet Talwalkar & Purnamrita Sarkar & Michael I. Jordan, 2014. "A scalable bootstrap for massive data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 795-816, September.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Guangbao Guo & Yue Sun & Xuejun Jiang, 2020. "A partitioned quasi-likelihood for distributed statistical inference," Computational Statistics, Springer, vol. 35(4), pages 1577-1596, December.
Xingcai Zhou & Zhaoyang Jing & Chao Huang, 2024. "Distributed Bootstrap Simultaneous Inference for High-Dimensional Quantile Regression," Mathematics, MDPI, vol. 12(5), pages 1-54, February.
Dean Eckles & Maurits Kaptein, 2019. "Bootstrap Thompson Sampling and Sequential Decision Problems in the Behavioral Sciences," SAGE Open, , vol. 9(2), pages 21582440198, June.
Badruddoza, Syed & Amin, Modhurima & McCluskey, Jill, 2019. "Assessing the Importance of an Attribute in a Demand SystemStructural Model versus Machine Learning," Working Papers 2019-5, School of Economic Sciences, Washington State University.
Olhede, Sofia C. & Wolfe, Patrick J., 2018. "The future of statistics and data science," Statistics & Probability Letters, Elsevier, vol. 136(C), pages 46-50.
Shi, Chengchun & Lu, Wenbin & Song, Rui, 2018. "A massive data framework for M-estimators with cubic-rate," LSE Research Online Documents on Economics 102111, London School of Economics and Political Science, LSE Library.
Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
Dimitris N Politis, 2024. "Scalable subsampling: computation, aggregation and inference," Biometrika, Biometrika Trust, vol. 111(1), pages 347-354.
MercÃ¨ Crosas & Gary King & James Honaker & Latanya Sweeney, 2015. "Automating Open Science for Big Data," The ANNALS of the American Academy of Political and Social Science, , vol. 659(1), pages 260-273, May.
Amalan Mahendran & Helen Thompson & James M. McGree, 2023. "A model robust subsampling approach for Generalised Linear Models in big data settings," Statistical Papers, Springer, vol. 64(4), pages 1137-1157, August.
Villoria, Nelson B. & Liu, Jing, 2018. "Using spatially explicit data to improve our understanding of land supply responses: An application to the cropland effects of global sustainable irrigation in the Americas," Land Use Policy, Elsevier, vol. 75(C), pages 411-419.
Vaughan, Gregory, 2020. "Efficient big data model selection with applications to fraud detection," International Journal of Forecasting, Elsevier, vol. 36(3), pages 1116-1127.
Wang, Xiaoqian & Kang, Yanfei & Hyndman, Rob J. & Li, Feng, 2023. "Distributed ARIMA models for ultra-long time series," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1163-1184.
- Xiaoqian Wang & Yanfei Kang & Rob J Hyndman & Feng Li, 2020. "Distributed ARIMA Models for Ultra-long Time Series," Monash Econometrics and Business Statistics Working Papers 29/20, Monash University, Department of Econometrics and Business Statistics.
Zhang, Likun & Castillo, Enrique del & Berglund, Andrew J. & Tingley, Martin P. & Govind, Nirmal, 2020. "Computing confidence intervals from massive data via penalized quantile smoothing splines," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
Milica Maricic & Jose A. Egea & Veljko Jeremic, 2019. "A Hybrid Enhanced Scatter Search—Composite I-Distance Indicator (eSS-CIDI) Optimization Approach for Determining Weights Within Composite Indicators," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 144(2), pages 497-537, July.
Yang, Xinfeng & Yan, Xiaodong & Huang, Jian, 2019. "High-dimensional integrative analysis with homogeneity and sparsity recovery," Journal of Multivariate Analysis, Elsevier, vol. 174(C).
Lee, JooChul & Wang, HaiYing & Schifano, Elizabeth D., 2020. "Online updating method to correct for measurement error in big data streams," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
Baihua He & Yanyan Liu & Guosheng Yin & Yuanshan Wu, 2023. "Model aggregation for doubly divided data with large size and large dimension," Computational Statistics, Springer, vol. 38(1), pages 509-529, March.
Fang, Jianglin, 2023. "A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
Xuejun Ma & Shaochen Wang & Wang Zhou, 2022. "Statistical inference in massive datasets by empirical likelihood," Computational Statistics, Springer, vol. 37(3), pages 1143-1164, July.

More about this item

Keywords

Development Level; Decision Tree; Random Forest; Fertility Rate; Machine Learning;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ist:ekoist:v:0:y:2023:i:38:p:87-104. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Ertugrul YASAR (email available below). General contact details of provider: https://edirc.repec.org/data/ifisttr.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Predicting Countriesâ€™ Development Levels Using the Decision Tree and Random Forest Methods

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data