IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v24y2022i6d10.1007_s10796-021-10195-9.html
   My bibliography  Save this article

Framework for the Classification of Imbalanced Structured Data Using Under-sampling and Convolutional Neural Network

Author

Listed:
  • Yoon Sang Lee

    (Columbus State University)

  • Chulhwan Chris Bang

    (Auburn University at Montgomery)

Abstract

Among machine learning techniques, classification techniques are useful for various business applications, but classification algorithms perform poorly with imbalanced data. In this study, we propose a classification technique with improved binary classification performance on both the minority and majority classes of imbalanced structured data. The proposed framework is composed of three steps. In the first step, a balanced training set is created via under-sampling. Then, each example is converted into an image depicting a line graph. In the last step, a Convolutional Neural Network (CNN) is trained using the images. In the experiments, we selected six datasets from the UCI Repository and applied the proposed framework to them. The proposed model achieved the best receiver operating characteristic (ROC) curve and Balanced Accuracy (BA) on all the datasets and five datasets, respectively. This demonstrates that the combination of under-sampling and CNNs is a viable approach for imbalanced structure data classification.

Suggested Citation

  • Yoon Sang Lee & Chulhwan Chris Bang, 2022. "Framework for the Classification of Imbalanced Structured Data Using Under-sampling and Convolutional Neural Network," Information Systems Frontiers, Springer, vol. 24(6), pages 1795-1809, December.
  • Handle: RePEc:spr:infosf:v:24:y:2022:i:6:d:10.1007_s10796-021-10195-9
    DOI: 10.1007/s10796-021-10195-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-021-10195-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-021-10195-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bang, Chulhwan Chris & Lee, Jaeung & Rao, H. Raghav, 2021. "The Egyptian protest movement in the twittersphere: An investigation of dual sentiment pathways of communication," International Journal of Information Management, Elsevier, vol. 58(C).
    2. Justin M. Johnson & Taghi M. Khoshgoftaar, 2020. "The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data," Information Systems Frontiers, Springer, vol. 22(5), pages 1113-1131, October.
    3. Haiman Tian & Shu-Ching Chen & Mei-Ling Shyu, 2020. "Evolutionary Programming Based Deep Learning Feature Selection and Network Construction for Visual Data Classification," Information Systems Frontiers, Springer, vol. 22(5), pages 1053-1066, October.
    4. Hatice Kizgin & Ahmad Jamal & Bidit Lal Dey & Nripendra P. Rana, 2018. "The Impact of Social Media on Consumers’ Acculturation and Purchase Intentions," Information Systems Frontiers, Springer, vol. 20(3), pages 503-514, June.
    5. Yogesh K. Dwivedi & Gerald Kelly & Marijn Janssen & Nripendra P. Rana & Emma L. Slade & Marc Clement, 2018. "Social Media: The Good, the Bad, and the Ugly," Information Systems Frontiers, Springer, vol. 20(3), pages 419-423, June.
    6. Cheng-Kui Huang & Tawei Wang & Tzu-Yen Huang, 2020. "Initial Evidence on the Impact of Big Data Implementation on Firm Performance," Information Systems Frontiers, Springer, vol. 22(2), pages 475-487, April.
    7. Chen, Zhen-Yu & Fan, Zhi-Ping & Sun, Minghe, 2012. "A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data," European Journal of Operational Research, Elsevier, vol. 223(2), pages 461-472.
    8. Matti Mäntymäki & Sami Hyrynsalmi & Antti Koskenvoima, 2020. "How Do Small and Medium-Sized Game Companies Use Analytics? An Attention-Based View of Game Analytics," Information Systems Frontiers, Springer, vol. 22(5), pages 1163-1178, October.
    9. Salima Smiti & Makram Soui, 2020. "Bankruptcy Prediction Using Deep Learning Approach Based on Borderline SMOTE," Information Systems Frontiers, Springer, vol. 22(5), pages 1067-1083, October.
    10. Mohammed Kuko & Mohammad Pourhomayoun, 2020. "Single and Clustered Cervical Cell Classification with Ensemble and Deep Learning Methods," Information Systems Frontiers, Springer, vol. 22(5), pages 1039-1051, October.
    11. Jun Liu & Prem Timsina & Omar El-Gayar, 2018. "A comparative analysis of semi-supervised learning: The case of article selection for medical systematic reviews," Information Systems Frontiers, Springer, vol. 20(2), pages 195-207, April.
    12. Prem Timsina & Jun Liu & Omar El-Gayar, 2016. "Advanced analytics for the automation of medical systematic reviews," Information Systems Frontiers, Springer, vol. 18(2), pages 237-252, April.
    13. Chrysanthos Dellarocas & Charles A. Wood, 2008. "The Sound of Silence in Online Feedback: Estimating Trading Risks in the Presence of Reporting Bias," Management Science, INFORMS, vol. 54(3), pages 460-476, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lydia Bouzar-Benlabiod & Stuart H. Rubin, 2020. "Heuristic Acquisition for Data Science," Information Systems Frontiers, Springer, vol. 22(5), pages 1001-1007, October.
    2. Christian Kauten & Ashish Gupta & Xiao Qin & Glenn Richey, 2022. "Predicting Blood Donors Using Machine Learning Techniques," Information Systems Frontiers, Springer, vol. 24(5), pages 1547-1562, October.
    3. Jithesh Arayankalam & Satish Krishnan, 2023. "ICT-Based Country-Level Determinants of Social Media Diffusion," Information Systems Frontiers, Springer, vol. 25(5), pages 1881-1902, October.
    4. Patrick Mikalef & Kshitij Sharma & Ilias O. Pappas & Michail Giannakos, 2021. "Seeking Information on Social Commerce: An Examination of the Impact of User- and Marketer-generated Content Through an Eye-tracking Study," Information Systems Frontiers, Springer, vol. 23(5), pages 1273-1286, September.
    5. Xiongfei Cao & Ali Nawaz Khan & Ahsan Ali & Naseer Abbas Khan, 0. "Consequences of Cyberbullying and Social Overload while Using SNSs: A Study of Users’ Discontinuous Usage Behavior in SNSs," Information Systems Frontiers, Springer, vol. 0, pages 1-14.
    6. R. Elakkiya & Pandi Vijayakumar & Marimuthu Karuppiah, 2021. "COVID_SCREENET: COVID-19 Screening in Chest Radiography Images Using Deep Transfer Stacking," Information Systems Frontiers, Springer, vol. 23(6), pages 1369-1383, December.
    7. Xiongfei Cao & Ali Nawaz Khan & Ahsan Ali & Naseer Abbas Khan, 2020. "Consequences of Cyberbullying and Social Overload while Using SNSs: A Study of Users’ Discontinuous Usage Behavior in SNSs," Information Systems Frontiers, Springer, vol. 22(6), pages 1343-1356, December.
    8. A. Geethapriya & S. Valli, 2021. "An Enhanced Approach to Map Domain-Specific Words in Cross-Domain Sentiment Analysis," Information Systems Frontiers, Springer, vol. 23(3), pages 791-805, June.
    9. Patrick Mikalef & Kshitij Sharma & Ilias O. Pappas & Michail Giannakos, 0. "Seeking Information on Social Commerce: An Examination of the Impact of User- and Marketer-generated Content Through an Eye-tracking Study," Information Systems Frontiers, Springer, vol. 0, pages 1-14.
    10. Hussain, Hadia & Murtaza, Murtaza & Ajmal, Areeb & Ahmed, Afreen & Khan, Muhammad Ovais Khalid, 2020. "A study on the effects of social media advertisement on consumer’s attitude and customer response," MPRA Paper 104675, University Library of Munich, Germany.
    11. Pierre Fleckinger & Matthieu Glachant & Gabrielle Moineville, 2017. "Incentives for Quality in Friendly and Hostile Informational Environments," American Economic Journal: Microeconomics, American Economic Association, vol. 9(1), pages 242-274, February.
    12. Kunpeng Zhang & Wendy Moe, 2021. "Measuring Brand Favorability Using Large-Scale Social Media Data," Information Systems Research, INFORMS, vol. 32(4), pages 1128-1139, December.
    13. Tobias Gesche, 2022. "Reference‐price shifts and customer antagonism: Evidence from reviews for online auctions," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 31(3), pages 558-578, August.
    14. Anke Joubert & Matthias Murawski & Markus Bick, 2023. "Measuring the Big Data Readiness of Developing Countries – Index Development and its Application to Africa," Information Systems Frontiers, Springer, vol. 25(1), pages 327-350, February.
    15. Claudia Keser & Maximilian Späth, 2020. "The Value of Bad Ratings: An Experiment on the Impact of Distortions in Reputation Systems," CIRANO Working Papers 2020s-22, CIRANO.
    16. Christoph Safferling & Aaron Lowen, 2011. "Economics in the Kingdom of Loathing: Analysis of Virtual Market Data," Working Paper Series of the Department of Economics, University of Konstanz 2011-30, Department of Economics, University of Konstanz.
    17. Judy E. Scott & Dawn G. Gregg & Jae Hoon Choi, 2015. "Lemon complaints: When online auctions go sour," Information Systems Frontiers, Springer, vol. 17(1), pages 177-191, February.
    18. Gary Bolton & Ben Greiner & Axel ockenfels, 2015. "Conflict resolution vs. conflict escalation in online markets," Discussion Papers 2015-19, School of Economics, The University of New South Wales.
    19. Wang, Yan & Yang, Jian & Qi, Lian, 2017. "A game-theoretic model for the role of reputation feedback systems in peer-to-peer commerce," International Journal of Production Economics, Elsevier, vol. 191(C), pages 178-193.
    20. Zhen-Yu Chen & Xin-Li Liu & Li-Ping Yin, 2023. "Data-driven product configuration improvement and product line restructuring with text mining and multitask learning," Journal of Intelligent Manufacturing, Springer, vol. 34(4), pages 2043-2059, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:24:y:2022:i:6:d:10.1007_s10796-021-10195-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.