IDEAS home Printed from https://ideas.repec.org/a/wly/complx/v2022y2022i1n9031900.html

Stock Price Prediction Based on Natural Language Processing1

Author

Listed:
  • Xiaobin Tang
  • Nuo Lei
  • Manru Dong
  • Dan Ma

Abstract

The keywords used in traditional stock price prediction are mainly based on literature and experience. This study designs a new text mining method for keywords augmentation based on natural language processing models including Bidirectional Encoder Representation from Transformers (BERT) and Neural Contextualized Representation for Chinese Language Understanding (NEZHA) natural language processing models. The BERT vectorization and the NEZHA keyword discrimination models extend the seed keywords from two dimensions of similarity and importance, respectively, thus constructing the keyword thesaurus for stock price prediction. Furthermore, the predictive ability of seed words and our generated words are compared by the LSTM model, taking the CSI 300 as an example. The result shows that, compared with seed keywords, the search indexes of extracted words have higher correlations with CSI 300 and can improve its forecasting performance. Therefore, the keywords augmentation model designed in this study is helpful to provide references for other variable expansion in financial time series forecasting.

Suggested Citation

  • Xiaobin Tang & Nuo Lei & Manru Dong & Dan Ma, 2022. "Stock Price Prediction Based on Natural Language Processing1," Complexity, John Wiley & Sons, vol. 2022(1).
  • Handle: RePEc:wly:complx:v:2022:y:2022:i:1:n:9031900
    DOI: 10.1155/2022/9031900
    as

    Download full text from publisher

    File URL: https://doi.org/10.1155/2022/9031900
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2022/9031900?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sidra Mehtab & Jaydip Sen, 2020. "A Time Series Analysis-Based Stock Price Prediction Using Machine Learning and Deep Learning Models," Papers 2004.11697, arXiv.org, revised May 2021.
    2. Sidra Mehtab & Jaydip Sen, 2020. "Stock Price Prediction Using Convolutional Neural Networks on a Multivariate Timeseries," Papers 2001.09769, arXiv.org.
    3. Sidra Mehtab & Jaydip Sen & Abhishek Dutta, 2020. "Stock Price Prediction Using Machine Learning and LSTM-Based Deep Learning Models," Papers 2009.10819, arXiv.org.
    4. Sidra Mehtab & Jaydip Sen, 2020. "Stock Price Prediction Using CNN and LSTM-Based Deep Learning Models," Papers 2010.13891, arXiv.org.
    5. Fama, Eugene F, 1970. "Efficient Capital Markets: A Review of Theory and Empirical Work," Journal of Finance, American Finance Association, vol. 25(2), pages 383-417, May.
    6. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    7. Ananda Chatterjee & Hrisav Bhowmick & Jaydip Sen, 2021. "Stock Price Prediction Using Time Series, Econometric, Machine Learning, and Deep Learning Models," Papers 2111.01137, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jaydip Sen & Ashwin Kumar R S & Geetha Joseph & Kaushik Muthukrishnan & Koushik Tulasi & Praveen Varukolu, 2022. "Precise Stock Price Prediction for Robust Portfolio Design from Selected Sectors of the Indian Stock Market," Papers 2201.05570, arXiv.org.
    2. Jaydip Sen & Sidra Mehtab, 2021. "Design and Analysis of Robust Deep Learning Models for Stock Price Prediction," Papers 2106.09664, arXiv.org.
    3. Jaydip Sen & Arpit Awad & Aaditya Raj & Gourav Ray & Pusparna Chakraborty & Sanket Das & Subhasmita Mishra, 2022. "Stock Performance Evaluation for Portfolio Design from Different Sectors of the Indian Stock Market," Papers 2208.07166, arXiv.org.
    4. Abhiraj Sen & Jaydip Sen, 2023. "Performance Evaluation of Equal-Weight Portfolio and Optimum Risk Portfolio on Indian Stocks," Papers 2309.13696, arXiv.org.
    5. Jaydip Sen & Saikat Mondal & Sidra Mehtab, 2021. "Analysis of Sectoral Profitability of the Indian Stock Market Using an LSTM Regression Model," Papers 2111.04976, arXiv.org.
    6. Jaydip Sen & Hetvi Waghela & Sneha Rakshit, 2024. "Exploring Sectoral Profitability in the Indian Stock Market Using Deep Learning," Papers 2407.01572, arXiv.org.
    7. Jaydip Sen, 2022. "Designing Efficient Pair-Trading Strategies Using Cointegration for the Indian Stock Market," Papers 2211.07080, arXiv.org.
    8. Jaydip Sen & Arup Dasgupta & Partha Pratim Sengupta & Sayantani Roy Choudhury, 2023. "A Comparative Study of Portfolio Optimization Methods for the Indian Stock Market," Papers 2310.14748, arXiv.org.
    9. Jaydip Sen & Abhishek Dutta, 2022. "Design and Analysis of Optimized Portfolios for Selected Sectors of the Indian Stock Market," Papers 2210.03943, arXiv.org.
    10. Jaydip Sen & Aditya Jaiswal & Anshuman Pathak & Atish Kumar Majee & Kushagra Kumar & Manas Kumar Sarkar & Soubhik Maji, 2023. "A Comparative Analysis of Portfolio Optimization Using Mean-Variance, Hierarchical Risk Parity, and Reinforcement Learning Approaches on the Indian Stock Market," Papers 2305.17523, arXiv.org.
    11. Jaydip Sen & Arup Dasgupta & Subhasis Dasgupta & Sayantani Roychoudhury, 2023. "A Portfolio Rebalancing Approach for the Indian Stock Market," Papers 2310.09770, arXiv.org.
    12. Jaydip Sen & Abhishek Dutta, 2022. "A Comparative Study of Hierarchical Risk Parity Portfolio and Eigen Portfolio on the NIFTY 50 Stocks," Papers 2210.00984, arXiv.org.
    13. Jaydip Sen & Rajdeep Sen & Abhishek Dutta, 2021. "Machine Learning in Finance-Emerging Trends and Challenges," Papers 2110.11999, arXiv.org.
    14. Li, Shuyue & Yarovaya, Larisa & Mishra, Tapas, 2025. "Machine learning, memory and efficiency in cryptocurrency markets," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 105(C).
    15. Yiyang Zheng, 2022. "Neural Network and Order Flow, Technical Analysis: Predicting short-term direction of futures contract," Papers 2203.12457, arXiv.org.
    16. Jaydip Sen & Saikat Mondal & Gourab Nath, 2022. "Robust Portfolio Design and Stock Price Prediction Using an Optimized LSTM Model," Papers 2204.01850, arXiv.org.
    17. Sidra Mehtab & Jaydip Sen, 2020. "A Time Series Analysis-Based Stock Price Prediction Using Machine Learning and Deep Learning Models," Papers 2004.11697, arXiv.org, revised May 2021.
    18. David M. Ritzwoller & Joseph P. Romano, 2019. "Uncertainty in the Hot Hand Fallacy: Detecting Streaky Alternatives to Random Bernoulli Sequences," Papers 1908.01406, arXiv.org, revised Apr 2021.
    19. Shazia Ghani, 2011. "A re-visit to Minsky after 2007 financial meltdown," Post-Print halshs-01027435, HAL.
    20. Christiane Goodfellow & Dirk Schiereck & Steffen Wippler, 2013. "Are behavioural finance equity funds a superior investment? A note on fund performance and market efficiency," Journal of Asset Management, Palgrave Macmillan, vol. 14(2), pages 111-119, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:complx:v:2022:y:2022:i:1:n:9031900. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://onlinelibrary.wiley.com/journal/8503 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.