Mining the Web for the Voice of the Herd to Track Stock Market Bubbles
We show that power-law analyses of financial commentaries from newspaper web-sites can be used to identify stock market bubbles, supplementing traditional volatility analyses. Using a four-year corpus of 17,713 online, finance-related articles (10M+ words) from the Financial Times, the New York Times, and the BBC, we show that week-to-week changes in power-law distributions reflect market movements of the Dow Jones Industrial Average (DJI), the FTSE-100, and the NIKKEI-225. Notably, the statistical regularities in language track the 2007 stock market bubble, showing emerging structure in the language of commentators, as progressively greater agreement arose in their positive perceptions of the market. Furthermore, during the bubble period, a marked divergence in positive language occurs as revealed by a Kullback-Leibler analysis.
When requesting a correction, please mention this item's handle: RePEc:arx:papers:1212.2676. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (arXiv administrators)
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If references are entirely missing, you can add them using this form.
If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.
Please note that corrections may take a couple of weeks to filter through the various RePEc services.