IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v226y2013i3p471-480.html
   My bibliography  Save this article

Selecting rows and columns for training support vector regression models with large retail datasets

Author

Listed:
  • Gür Ali, Özden
  • Yaman, Kübra

Abstract

Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard SVM tools. ROCSA uses ε-SVR models with L1-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced ROCSA training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard SVM tools. Comparison with the ε SSVR method using reduced kernel technique shows similar performance improvement. Training a standard SVM tool with the ROCSA selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling.

Suggested Citation

  • Gür Ali, Özden & Yaman, Kübra, 2013. "Selecting rows and columns for training support vector regression models with large retail datasets," European Journal of Operational Research, Elsevier, vol. 226(3), pages 471-480.
  • Handle: RePEc:eee:ejores:v:226:y:2013:i:3:p:471-480
    DOI: 10.1016/j.ejor.2012.11.013
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221712008375
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2012.11.013?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Carbonneau, Real & Laframboise, Kevin & Vahidov, Rustam, 2008. "Application of machine learning techniques for supply chain demand forecasting," European Journal of Operational Research, Elsevier, vol. 184(3), pages 1140-1154, February.
    2. Lu, Chi-Jie & Wang, Yen-Wen, 2010. "Combining independent component analysis and growing hierarchical self-organizing maps with support vector regression in product demand forecasting," International Journal of Production Economics, Elsevier, vol. 128(2), pages 603-613, December.
    3. Wu, Shaomin & Akbarov, Artur, 2011. "Support vector regression for warranty claim forecasting," European Journal of Operational Research, Elsevier, vol. 213(1), pages 196-204, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bottmer, Lea & Croux, Christophe & Wilms, Ines, 2022. "Sparse regression for large data sets with outliers," European Journal of Operational Research, Elsevier, vol. 297(2), pages 782-794.
    2. Ulrich, Matthias & Jahnke, Hermann & Langrock, Roland & Pesch, Robert & Senge, Robin, 2022. "Classification-based model selection in retail demand forecasting," International Journal of Forecasting, Elsevier, vol. 38(1), pages 209-223.
    3. Ma, Shaohui & Fildes, Robert, 2021. "Retail sales forecasting with meta-learning," European Journal of Operational Research, Elsevier, vol. 288(1), pages 111-128.
    4. Guillaume Coqueret & Tony Guida, 2020. "Training trees on tails with applications to portfolio choice," Annals of Operations Research, Springer, vol. 288(1), pages 181-221, May.
    5. Fildes, Robert & Ma, Shaohui & Kolassa, Stephan, 2022. "Retail forecasting: Research and practice," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1283-1318.
    6. Ma, Shaohui & Fildes, Robert, 2017. "A retail store SKU promotions optimization model for category multi-period profit maximization," European Journal of Operational Research, Elsevier, vol. 260(2), pages 680-692.
    7. Gür Ali, Özden & Gürlek, Ragıp, 2020. "Automatic Interpretable Retail forecasting with promotional scenarios," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1389-1406.
    8. Fildes, Robert & Ma, Shaohui & Kolassa, Stephan, 2019. "Retail forecasting: research and practice," MPRA Paper 89356, University Library of Munich, Germany.
    9. Ma, Shaohui & Fildes, Robert, 2020. "Forecasting third-party mobile payments with implications for customer flow prediction," International Journal of Forecasting, Elsevier, vol. 36(3), pages 739-760.
    10. Semenoglou, Artemios-Anargyros & Spiliotis, Evangelos & Makridakis, Spyros & Assimakopoulos, Vassilios, 2021. "Investigating the accuracy of cross-learning time series forecasting methods," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1072-1084.
    11. Wellens, Arnoud P. & Udenio, Maxi & Boute, Robert N., 2022. "Transfer learning for hierarchical forecasting: Reducing computational efforts of M5 winning methods," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1482-1491.
    12. Gur Ali, Ozden & Pinar, Efe, 2016. "Multi-period-ahead forecasting with residual extrapolation and information sharing — Utilizing a multitude of retail series," International Journal of Forecasting, Elsevier, vol. 32(2), pages 502-517.
    13. Guillaume Coqueret & Tony Guida, 2020. "Training trees on tails with applications to portfolio choice," Post-Print hal-04144665, HAL.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cang, Shuang & Yu, Hongnian, 2014. "A combination selection algorithm on forecasting," European Journal of Operational Research, Elsevier, vol. 234(1), pages 127-139.
    2. Yang, Duo & He, Zhen & He, Shuguang, 2016. "Warranty claims forecasting based on a general imperfect repair model considering usage rate," Reliability Engineering and System Safety, Elsevier, vol. 145(C), pages 147-154.
    3. Wang, Binni & Wang, Pong & Tu, Yiliu, 2021. "Customer satisfaction service match and service quality-based blockchain cloud manufacturing," International Journal of Production Economics, Elsevier, vol. 240(C).
    4. Sangho Lee & Youngdoo Son, 2021. "Motor Load Balancing with Roll Force Prediction for a Cold-Rolling Setup with Neural Networks," Mathematics, MDPI, vol. 9(12), pages 1-21, June.
    5. Wu, Shaomin, 2014. "Construction of asymmetric copulas and its application in two-dimensional reliability modelling," European Journal of Operational Research, Elsevier, vol. 238(2), pages 476-485.
    6. Zhou, Chongwen & Chinnam, Ratna Babu & Dalkiran, Evrim & Korostelev, Alexander, 2017. "Bayesian approach to hazard rate models for early detection of warranty and reliability problems using upstream supply chain information," International Journal of Production Economics, Elsevier, vol. 193(C), pages 316-331.
    7. Huber, Jakob & Stuckenschmidt, Heiner, 2020. "Daily retail demand forecasting using machine learning with emphasis on calendric special days," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1420-1438.
    8. Lechtenberg, Sandra & de Siqueira Braga, Diego & Hellingrath, Bernd, 2019. "Automatic identification system (AIS) data based ship-supply forecasting," Chapters from the Proceedings of the Hamburg International Conference of Logistics (HICL), in: Jahn, Carlos & Kersten, Wolfgang & Ringle, Christian M. (ed.), Digital Transformation in Maritime and City Logistics: Smart Solutions for Logistics. Proceedings of the Hamburg International Conference of Logistics, volume 28, pages 3-24, Hamburg University of Technology (TUHH), Institute of Business Logistics and General Management.
    9. Jihane El Ouadi & Hanae Errousso & Nicolas Malhene & Siham Benhadou & Hicham Medromi, 2022. "A machine-learning based hybrid algorithm for strategic location of urban bundling hubs to support shared public transport," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(5), pages 3215-3258, October.
    10. Bin Shen & Hau-Ling Chan, 2017. "Forecast Information Sharing for Managing Supply Chains in the Big Data Era: Recent Development and Future Research," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 34(01), pages 1-26, February.
    11. Kate Murray & Andrea Rossi & Diego Carraro & Andrea Visentin, 2023. "On Forecasting Cryptocurrency Prices: A Comparison of Machine Learning, Deep Learning, and Ensembles," Forecasting, MDPI, vol. 5(1), pages 1-14, January.
    12. Chehade, Abdallah & Savargaonkar, Mayuresh & Krivtsov, Vasiliy, 2022. "Conditional Gaussian mixture model for warranty claims forecasting," Reliability Engineering and System Safety, Elsevier, vol. 218(PB).
    13. Theresa Maria Rausch & Tobias Albrecht & Daniel Baier, 2022. "Beyond the beaten paths of forecasting call center arrivals: on the use of dynamic harmonic regression with predictor variables," Journal of Business Economics, Springer, vol. 92(4), pages 675-706, May.
    14. Kizilaslan, Recep & Freund, Steven & Iseri, Ali, 2016. "A data analytic approach to forecasting daily stock returns in an emerging marketAuthor-Name: Oztekin, Asil," European Journal of Operational Research, Elsevier, vol. 253(3), pages 697-710.
    15. Oscar Claveria & Enric Monte & Salvador Torra, 2015. "“Self-organizing map analysis of agents’ expectations. Different patterns of anticipation of the 2008 financial crisis”," AQR Working Papers 201508, University of Barcelona, Regional Quantitative Analysis Group, revised Mar 2015.
    16. Deniz Preil & Michael Krapp, 2022. "Artificial intelligence-based inventory management: a Monte Carlo tree search approach," Annals of Operations Research, Springer, vol. 308(1), pages 415-439, January.
    17. Kandaswamy Paramasivan & Rahul Subburaj & Saish Jaiswal & Nandan Sudarsanam, 2022. "Empirical evidence of the impact of mobility on property crimes during the first two waves of the COVID-19 pandemic," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-14, December.
    18. Chattopadhyay, Manojit & Kumar Mitra, Subrata, 2015. "Exploring asymmetric behavior pattern from Indian oil products prices using NARDL and GHSOM approaches," Energy Policy, Elsevier, vol. 86(C), pages 262-272.
    19. David Dilts & James Moore, 2009. "Do Arbitrators Use Just Cause Standards in Deciding Discharge and Discipline Cases? A Test," Journal of Labor Research, Springer, vol. 30(3), pages 245-261, September.
    20. Yuqi Dong & Xuejiao Ma & Chenchen Ma & Jianzhou Wang, 2016. "Research and Application of a Hybrid Forecasting Model Based on Data Decomposition for Electrical Load Forecasting," Energies, MDPI, vol. 9(12), pages 1-30, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:226:y:2013:i:3:p:471-480. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.