IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v234y2015i1p77-9410.1007-s10479-014-1779-z.html
   My bibliography  Save this article

Composite leading search index: a preprocessing method of internet search data for stock trends prediction

Author

Listed:
  • Ying Liu
  • Yibing Chen
  • Sheng Wu
  • Geng Peng
  • Benfu Lv

Abstract

Previous studies have revealed that Internet search data is a new source of data that can be used to predict the stock market. In this new, data-driven research field, choosing a method for preprocessing data is crucial to achieving accurate prediction performance. This paper proposes a preprocessing method of Internet search data: composite leading search index (CLSI), which is composed of three steps: (a) keyword selection, (b) time difference measurement, and (c) leading index composition. We demonstrate the validity of CLSI by comparing this method’s results with the results from search volume index (SVI), which is most commonly used in previous literatures. We build a time series model (TS) with error correction and support vector regression (SVR) for stock trend prediction, and combine into four models for comparison: SVI–TS, CLSI–TS, SVI–SVR, and CLSI–SVR. We test these four models in the context of the Chinese stock market, which interests more and more investors nowadays, and analyzed results in nine datasets: stable periods, peak periods and trough periods of Shanghai Composite Index, Shenzhen Composite Index, and Hushen 300 index respectively. The results show that using TS and SVR as forecasting models, CLSI performs better than SVI on majority of the test dataset while has almost the same performance with that of SVI on the remaining test dataset. It is to some extent convincing that CLSI is a more efficient preprocessing method of Internet search data for stock trend prediction. Copyright Springer Science+Business Media New York 2015

Suggested Citation

  • Ying Liu & Yibing Chen & Sheng Wu & Geng Peng & Benfu Lv, 2015. "Composite leading search index: a preprocessing method of internet search data for stock trends prediction," Annals of Operations Research, Springer, vol. 234(1), pages 77-94, November.
  • Handle: RePEc:spr:annopr:v:234:y:2015:i:1:p:77-94:10.1007/s10479-014-1779-z
    DOI: 10.1007/s10479-014-1779-z
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10479-014-1779-z
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10479-014-1779-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Geoffrey H. Moore & Julius Shiskin, 1967. "Indicators of Business Expansions and Contractions," NBER Books, National Bureau of Economic Research, Inc, number moor67-2, March.
    2. Nikolaos Askitas & Klaus F. Zimmermann, 2009. "Google Econometrics and Unemployment Forecasting," Applied Economics Quarterly (formerly: Konjunkturpolitik), Duncker & Humblot, Berlin, vol. 55(2), pages 107-120.
    3. Huina Mao & Scott Counts & Johan Bollen, 2011. "Predicting Financial Markets: Comparing Survey, News, Twitter and Search Engine Data," Papers 1112.1051, arXiv.org.
    4. Li Wang & Ji Zhu, 2010. "Financial market forecasting using a two-step kernel learning method for the support vector regression," Annals of Operations Research, Springer, vol. 174(1), pages 103-120, February.
    5. Heather R. Tierney & Bing Pan, 2012. "A poisson regression examination of the relationship between website traffic and search engine queries," Netnomics, Springer, vol. 13(3), pages 155-189, October.
    6. Zhi Da & Joseph Engelberg & Pengjie Gao, 2011. "In Search of Attention," Journal of Finance, American Finance Association, vol. 66(5), pages 1461-1499, October.
    7. Hyunyoung Choi & Hal Varian, 2012. "Predicting the Present with Google Trends," The Economic Record, The Economic Society of Australia, vol. 88(s1), pages 2-9, June.
    8. Granger, C. W. J., 1988. "Some recent development in a concept of causality," Journal of Econometrics, Elsevier, vol. 39(1-2), pages 199-211.
    9. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    10. Ernst A. Boehm, 2001. "The Contribution of Economic Indicator Analysis to Understanding and Forecasting Business Cycles," Melbourne Institute Working Paper Series wp2001n17, Melbourne Institute of Applied Economic and Social Research, The University of Melbourne.
    11. Smith, Geoffrey Peter, 2012. "Google Internet search activity and volatility prediction in the market for foreign currency," Finance Research Letters, Elsevier, vol. 9(2), pages 103-110.
    12. Werner Antweiler & Murray Z. Frank, 2004. "Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards," Journal of Finance, American Finance Association, vol. 59(3), pages 1259-1294, June.
    13. Ernst A. Boehm, 2001. "The Contribution of Economic Indicator Analysis to Understanding and Forecasting Business Cycles," Indian Economic Review, Department of Economics, Delhi School of Economics, vol. 36(1), pages 1-36, January.
    14. Qing Cao & Mark Parry & Karyl Leggio, 2011. "The three-factor model and artificial neural networks: predicting stock price movement in China," Annals of Operations Research, Springer, vol. 185(1), pages 25-44, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jichang Dong & Wei Dai & Ying Liu & Lean Yu & Jie Wang, 2019. "Forecasting Chinese Stock Market Prices using Baidu Search Index with a Learning-Based Data Collection Method," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(05), pages 1605-1629, September.
    2. Prabhsimran Singh & Yogesh K. Dwivedi & Karanjeet Singh Kahlon & Ravinder Singh Sawhney & Ali Abdallah Alalwan & Nripendra P. Rana, 0. "Smart Monitoring and Controlling of Government Policies Using Social Media and Cloud Computing," Information Systems Frontiers, Springer, vol. 0, pages 1-23.
    3. Prabhsimran Singh & Yogesh K. Dwivedi & Karanjeet Singh Kahlon & Ravinder Singh Sawhney & Ali Abdallah Alalwan & Nripendra P. Rana, 2020. "Smart Monitoring and Controlling of Government Policies Using Social Media and Cloud Computing," Information Systems Frontiers, Springer, vol. 22(2), pages 315-337, April.
    4. Mario Maggi & Pierpaolo Uberti, 2021. "Google search volumes for portfolio management: performances and asset concentration," Annals of Operations Research, Springer, vol. 299(1), pages 163-175, April.
    5. Madanjit Singh & Amardeep Singh & Sarveshwar Bharti & Prithvipal Singh & Munish Saini, 2022. "Using Social Media Analytics and Machine Learning Approaches to Analyze the Behavioral Response of Agriculture Stakeholders during the COVID-19 Pandemic," Sustainability, MDPI, vol. 14(23), pages 1-18, December.
    6. Li, Cheng & Ge, Peng & Liu, Zhusheng & Zheng, Weimin, 2020. "Forecasting tourist arrivals using denoising and potential factors," Annals of Tourism Research, Elsevier, vol. 83(C).
    7. Jingwen Liu & Peng Zou & Yu Ma, 2022. "The Effect of Air Pollution on Food Preferences," Journal of the Academy of Marketing Science, Springer, vol. 50(2), pages 410-423, March.
    8. Hua Wu & Taiwen Feng & Wenbo Jiang & Ting Kong, 2022. "Environmental Penalties, Investor Attention and Stock Market Reaction: Moderating Roles of Air Pollution and Industry Saliency," IJERPH, MDPI, vol. 19(5), pages 1-27, February.
    9. Shaolong Sun & Yanzhao Li & Ju-e Guo & Shouyang Wang, 2020. "Tourism Demand Forecasting: An Ensemble Deep Learning Approach," Papers 2002.07964, arXiv.org, revised Jan 2021.
    10. Lin, Yong & Wang, Renyu & Gong, Xingyue & Jia, Guozhu, 2022. "Cross-correlation and forecast impact of public attention on USD/CNY exchange rate: Evidence from Baidu Index," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 604(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jichang Dong & Wei Dai & Ying Liu & Lean Yu & Jie Wang, 2019. "Forecasting Chinese Stock Market Prices using Baidu Search Index with a Learning-Based Data Collection Method," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(05), pages 1605-1629, September.
    2. Coble, David & Pincheira, Pablo, 2017. "Nowcasting Building Permits with Google Trends," MPRA Paper 76514, University Library of Munich, Germany.
    3. Fabio Milani, 2021. "COVID-19 outbreak, social response, and early economic effects: a global VAR analysis of cross-country interdependencies," Journal of Population Economics, Springer;European Society for Population Economics, vol. 34(1), pages 223-252, January.
    4. D’Amuri, Francesco & Marcucci, Juri, 2017. "The predictive power of Google searches in forecasting US unemployment," International Journal of Forecasting, Elsevier, vol. 33(4), pages 801-816.
    5. Meshcheryakov, Artem & Winters, Drew B., 2022. "Retail investor attention and the limit order book: Intraday analysis of attention-based trading," International Review of Financial Analysis, Elsevier, vol. 81(C).
    6. Papadamou, Stephanos & Fassas, Athanasios & Kenourgios, Dimitris & Dimitriou, Dimitrios, 2020. "Direct and Indirect Effects of COVID-19 Pandemic on Implied Stock Market Volatility: Evidence from Panel Data Analysis," MPRA Paper 100020, University Library of Munich, Germany.
    7. Huang, Xiankai & Zhang, Lifeng & Ding, Yusi, 2017. "The Baidu Index: Uses in predicting tourism flows –A case study of the Forbidden City," Tourism Management, Elsevier, vol. 58(C), pages 301-306.
    8. Dean Fantazzini, 2014. "Nowcasting and Forecasting the Monthly Food Stamps Data in the US Using Online Search Data," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-27, November.
    9. Smales, L.A., 2021. "Investor attention and global market returns during the COVID-19 crisis," International Review of Financial Analysis, Elsevier, vol. 73(C).
    10. Bangwayo-Skeete, Prosper F. & Skeete, Ryan W., 2015. "Can Google data improve the forecasting performance of tourist arrivals? Mixed-data sampling approach," Tourism Management, Elsevier, vol. 46(C), pages 454-464.
    11. Thomas Dimpfl & Tobias Langen, 2019. "How Unemployment Affects Bond Prices: A Mixed Frequency Google Nowcasting Approach," Computational Economics, Springer;Society for Computational Economics, vol. 54(2), pages 551-573, August.
    12. Gomes, Pedro & Taamouti, Abderrahim, 2016. "In search of the determinants of European asset market comovements," International Review of Economics & Finance, Elsevier, vol. 44(C), pages 103-117.
    13. Halousková, Martina & Stašek, Daniel & Horváth, Matúš, 2022. "The role of investor attention in global asset price variation during the invasion of Ukraine," Finance Research Letters, Elsevier, vol. 50(C).
    14. Böhme, Marcus H. & Gröger, André & Stöhr, Tobias, 2020. "Searching for a better life: Predicting international migration with online search keywords," Journal of Development Economics, Elsevier, vol. 142(C).
    15. Tong Liu & Guojun He & Alexis Lau, 2018. "Avoidance behavior against air pollution: evidence from online search indices for anti-PM2.5 masks and air filters in Chinese cities," Environmental Economics and Policy Studies, Springer;Society for Environmental Economics and Policy Studies - SEEPS, vol. 20(2), pages 325-363, April.
    16. Basistha, Arabinda & Kurov, Alexander & Wolfe, Marketa Halova, 2019. "Volatility Forecasting: The Role of Internet Search Activity and Implied Volatility," MPRA Paper 111037, University Library of Munich, Germany.
    17. Bentzen, Jeanet Sinding, 2021. "In crisis, we pray: Religiosity and the COVID-19 pandemic," Journal of Economic Behavior & Organization, Elsevier, vol. 192(C), pages 541-583.
    18. Perroni, Carlo & Scharf, Kimberley & Talavera, Oleksandr & Vi, Linh, 2022. "Does online salience predict charitable giving? Evidence from SMS text donations," Journal of Economic Behavior & Organization, Elsevier, vol. 197(C), pages 134-149.
    19. Götz, Thomas B. & Knetsch, Thomas A., 2019. "Google data in bridge equation models for German GDP," International Journal of Forecasting, Elsevier, vol. 35(1), pages 45-66.
    20. Abay,Kibrom A. & Hirfrfot,Kibrom Tafere & Woldemichael,Andinet, 2020. "Winners and Losers from COVID-19 : Global Evidence from Google Search," Policy Research Working Paper Series 9268, The World Bank.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:234:y:2015:i:1:p:77-94:10.1007/s10479-014-1779-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.