IDEAS home Printed from https://ideas.repec.org/a/wsi/ijitdm/v18y2019i05ns0219622019500287.html
   My bibliography  Save this article

Forecasting Chinese Stock Market Prices using Baidu Search Index with a Learning-Based Data Collection Method

Author

Listed:
  • Jichang Dong

    (School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, P. R. China)

  • Wei Dai

    (School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, P. R. China)

  • Ying Liu

    (School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, P. R. China†The Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences, Beijing 100190, P. R. China)

  • Lean Yu

    (#x2021;School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China)

  • Jie Wang

    (#xA7;Department of Civil and Environmental Engineering, Stanford University, Stanford, CA 94305, USA)

Abstract

In this study, to address search index selection and volatility problems, we propose a learning-based search index collection method that collects the search data fraction for modeling by learning the best criteria from robust statistics. Based on the fraction of collected search index from internet search engine (Baidu.com) data sources, a novel model is formulated for Chinese stock market price forecasting. We empirically test our method on the two main Chinese stock market price indexes and discover that the prediction accuracy is equivalent or superior to the benchmarks from previous studies that used alternative search index collection methods or lagged data prediction models. All prediction results outstand the importance of an effective data collection method for the robustness of forecast models and demonstrate the utility of a learning-based collection method for addressing search index collection problem, leading to a significant improvement in Chinese stock market price prediction accuracy.

Suggested Citation

  • Jichang Dong & Wei Dai & Ying Liu & Lean Yu & Jie Wang, 2019. "Forecasting Chinese Stock Market Prices using Baidu Search Index with a Learning-Based Data Collection Method," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(05), pages 1605-1629, September.
  • Handle: RePEc:wsi:ijitdm:v:18:y:2019:i:05:n:s0219622019500287
    DOI: 10.1142/S0219622019500287
    as

    Download full text from publisher

    File URL: http://www.worldscientific.com/doi/abs/10.1142/S0219622019500287
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219622019500287?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Nikolaos Askitas & Klaus F. Zimmermann, 2009. "Google Econometrics and Unemployment Forecasting," Applied Economics Quarterly (formerly: Konjunkturpolitik), Duncker & Humblot, Berlin, vol. 55(2), pages 107-120.
    2. Smith, Geoffrey Peter, 2012. "Google Internet search activity and volatility prediction in the market for foreign currency," Finance Research Letters, Elsevier, vol. 9(2), pages 103-110.
    3. Zhu, Bangzhu & Wei, Yiming, 2013. "Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology," Omega, Elsevier, vol. 41(3), pages 517-524.
    4. D’Amuri, Francesco & Marcucci, Juri, 2010. "“Google it!” Forecasting the US Unemployment Rate with a Google Job Search index," Global Challenges Papers 60680, Fondazione Eni Enrico Mattei (FEEM).
    5. D'Amuri, Francesco & Marcucci, Juri, 2009. "‘Google it!’ Forecasting the US unemployment rate with a Google job search index," ISER Working Paper Series 2009-32, Institute for Social and Economic Research.
    6. Andrea Freyer Dugas & Mehdi Jalalpour & Yulia Gel & Scott Levin & Fred Torcaso & Takeru Igusa & Richard E Rothman, 2013. "Influenza Forecasting with Google Flu Trends," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-7, February.
    7. Ilaria Bordino & Stefano Battiston & Guido Caldarelli & Matthieu Cristelli & Antti Ukkonen & Ingmar Weber, 2012. "Web Search Queries Can Predict Stock Market Volumes," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-17, July.
    8. Zhang, Guoqiang & Eddy Patuwo, B. & Y. Hu, Michael, 1998. "Forecasting with artificial neural networks:: The state of the art," International Journal of Forecasting, Elsevier, vol. 14(1), pages 35-62, March.
    9. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    10. Ying Liu & Yibing Chen & Sheng Wu & Geng Peng & Benfu Lv, 2015. "Composite leading search index: a preprocessing method of internet search data for stock trends prediction," Annals of Operations Research, Springer, vol. 234(1), pages 77-94, November.
    11. Sarat Chandra Nayak & Bijan Bihari Misra, 2018. "Estimating stock closing indices using a GA-weighted condensed polynomial neural network," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 4(1), pages 1-22, December.
    12. Zhi Da & Joseph Engelberg & Pengjie Gao, 2011. "In Search of Attention," Journal of Finance, American Finance Association, vol. 66(5), pages 1461-1499, October.
    13. Simeon Vosen & Torsten Schmidt, 2011. "Forecasting private consumption: survey‐based indicators vs. Google trends," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 30(6), pages 565-578, September.
    14. Hyunyoung Choi & Hal Varian, 2012. "Predicting the Present with Google Trends," The Economic Record, The Economic Society of Australia, vol. 88(s1), pages 2-9, June.
    15. Dehua Shen & Yongjie Zhang & Xiong Xiong & Wei Zhang, 2017. "Baidu index and predictability of Chinese stock returns," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 3(1), pages 1-8, December.
    16. Pai, Ping-Feng & Lin, Chih-Sheng, 2005. "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, Elsevier, vol. 33(6), pages 497-505, December.
    17. Yu, Lean & Wang, Shouyang & Lai, Kin Keung, 2008. "Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm," Energy Economics, Elsevier, vol. 30(5), pages 2623-2635, September.
    18. Zhang, Ningning & Lin, Aijing & Shang, Pengjian, 2017. "Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 477(C), pages 161-173.
    19. Bijl, Laurens & Kringhaug, Glenn & Molnár, Peter & Sandvik, Eirik, 2016. "Google searches and stock returns," International Review of Financial Analysis, Elsevier, vol. 45(C), pages 150-156.
    20. Tay, Francis E. H. & Cao, Lijuan, 2001. "Application of support vector machines in financial time series forecasting," Omega, Elsevier, vol. 29(4), pages 309-317, August.
    21. Leigh, W. & Paz, M. & Purvis, R., 2002. "An analysis of a hybrid neural network and pattern recognition technique for predicting short-term increases in the NYSE composite index," Omega, Elsevier, vol. 30(2), pages 69-76, April.
    22. Wei Huang & Kin Keung Lai & Yoshiteru Nakamori & Shouyang Wang & Lean Yu, 2007. "Neural Networks In Finance And Economics Forecasting," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 6(01), pages 113-140.
    23. Ling Tang & Wei Dai & Lean Yu & Shouyang Wang, 2015. "A Novel CEEMD-Based EELM Ensemble Learning Paradigm for Crude Oil Price Forecasting," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 14(01), pages 141-169.
    24. Nuno Barreira & Pedro Godinho & Paulo Melo, 2013. "Nowcasting unemployment rate and new car sales in south-western Europe with Google Trends," Netnomics, Springer, vol. 14(3), pages 129-165, November.
    25. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    26. Dzielinski, Michal, 2012. "Measuring economic uncertainty and its impact on the stock market," Finance Research Letters, Elsevier, vol. 9(3), pages 167-175.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Coble, David & Pincheira, Pablo, 2017. "Nowcasting Building Permits with Google Trends," MPRA Paper 76514, University Library of Munich, Germany.
    2. D’Amuri, Francesco & Marcucci, Juri, 2017. "The predictive power of Google searches in forecasting US unemployment," International Journal of Forecasting, Elsevier, vol. 33(4), pages 801-816.
    3. Bangwayo-Skeete, Prosper F. & Skeete, Ryan W., 2015. "Can Google data improve the forecasting performance of tourist arrivals? Mixed-data sampling approach," Tourism Management, Elsevier, vol. 46(C), pages 454-464.
    4. Thomas Dimpfl & Tobias Langen, 2019. "How Unemployment Affects Bond Prices: A Mixed Frequency Google Nowcasting Approach," Computational Economics, Springer;Society for Computational Economics, vol. 54(2), pages 551-573, August.
    5. Jaroslav Pavlicek & Ladislav Kristoufek, 2014. "Can Google searches help nowcast and forecast unemployment rates in the Visegrad Group countries?," Papers 1408.6639, arXiv.org.
    6. Yang, Xin & Pan, Bing & Evans, James A. & Lv, Benfu, 2015. "Forecasting Chinese tourist volume with search engine data," Tourism Management, Elsevier, vol. 46(C), pages 386-397.
    7. Long Wen & Chang Liu & Haiyan Song, 2019. "Forecasting tourism demand using search query data: A hybrid modelling approach," Tourism Economics, , vol. 25(3), pages 309-329, May.
    8. Jaroslav Pavlicek & Ladislav Kristoufek, 2015. "Nowcasting Unemployment Rates with Google Searches: Evidence from the Visegrad Group Countries," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-11, May.
    9. Bai, Lijuan & Yan, Xiangbin & Yu, Guang, 2019. "Impact of CEO media appearance on corporate performance in social media," The North American Journal of Economics and Finance, Elsevier, vol. 50(C).
    10. Gomes, Pedro & Taamouti, Abderrahim, 2016. "In search of the determinants of European asset market comovements," International Review of Economics & Finance, Elsevier, vol. 44(C), pages 103-117.
    11. Daniel Borup & Erik Christian Montes Schütte, 2019. "In search of a job: Forecasting employment growth using Google Trends," CREATES Research Papers 2019-13, Department of Economics and Business Economics, Aarhus University.
    12. Papadamou, Stephanos & Fassas, Athanasios & Kenourgios, Dimitris & Dimitriou, Dimitrios, 2020. "Direct and Indirect Effects of COVID-19 Pandemic on Implied Stock Market Volatility: Evidence from Panel Data Analysis," MPRA Paper 100020, University Library of Munich, Germany.
    13. Halousková, Martina & Stašek, Daniel & Horváth, Matúš, 2022. "The role of investor attention in global asset price variation during the invasion of Ukraine," Finance Research Letters, Elsevier, vol. 50(C).
    14. Dimpfl, Thomas & Langen, Tobias, 2015. "A Cross-Country Analysis of Unemployment and Bonds with Long-Memory Relations," VfS Annual Conference 2015 (Muenster): Economic Development - Theory and Policy 112921, Verein für Socialpolitik / German Economic Association.
    15. Basistha, Arabinda & Kurov, Alexander & Wolfe, Marketa Halova, 2019. "Volatility Forecasting: The Role of Internet Search Activity and Implied Volatility," MPRA Paper 111037, University Library of Munich, Germany.
    16. Mario Maggi & Pierpaolo Uberti, 2021. "Google search volumes for portfolio management: performances and asset concentration," Annals of Operations Research, Springer, vol. 299(1), pages 163-175, April.
    17. Jacques Bughin, 2015. "Google searches and twitter mood: nowcasting telecom sales performance," Netnomics, Springer, vol. 16(1), pages 87-105, August.
    18. Chien-jung Ting & Yi-Long Hsiao, 2022. "Nowcasting the GDP in Taiwan and the Real-Time Tourism Data," Advances in Management and Applied Economics, SCIENPRESS Ltd, vol. 12(3), pages 1-2.
    19. Götz, Thomas B. & Knetsch, Thomas A., 2019. "Google data in bridge equation models for German GDP," International Journal of Forecasting, Elsevier, vol. 35(1), pages 45-66.
    20. González-Fernández, Marcos & González-Velasco, Carmen, 2020. "An alternative approach to predicting bank credit risk in Europe with Google data," Finance Research Letters, Elsevier, vol. 35(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:ijitdm:v:18:y:2019:i:05:n:s0219622019500287. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/ijitdm/ijitdm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.