IDEAS home Printed from https://ideas.repec.org/p/cam/camdae/2630.html

Text as Priors

Author

Listed:
  • Ge, S.
  • Li, S.
  • Linton, O. B.
  • Su, W.

Abstract

This paper studies how textual information can be used as prior information in high-dimensional penalized estimation. Rather than treating text as an additional set of regressors, we use textual analysis to construct two classes of priors: linkage priors, which encode the relative relevance of covariates, and direction priors, which encode prior information about coefficient signs. We incorporate these priors through weighted and asymmetric LASSO procedures. We show that, when the priors are sufficiently informative, they improve variable selection by relaxing the irrepresentable condition required for selection consistency of the standard LASSO, especially in settings with strongly correlated covariates. We illustrate the framework in three applications based on Chinese financial news: cross-firm return prediction, large precision matrix estimation for portfolio construction, and high-dimensional text regression with sentiment-based sign restrictions. Overall, the results show that text can enhance high-dimensional estimation not only as data, but also as a source of economically meaningful prior information.

Suggested Citation

  • Ge, S. & Li, S. & Linton, O. B. & Su, W., 2026. "Text as Priors," Cambridge Working Papers in Economics 2630, Faculty of Economics, University of Cambridge.
  • Handle: RePEc:cam:camdae:2630
    as

    Download full text from publisher

    File URL: https://www.econ.cam.ac.uk/sites/default/files/publication-cwpe-pdfs/cwpe2630.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Varian, Hal R, 1982. "The Nonparametric Approach to Demand Analysis," Econometrica, Econometric Society, vol. 50(4), pages 945-973, July.
    3. Ait-Sahalia, Yacine & Duarte, Jefferson, 2003. "Nonparametric option pricing under shape restrictions," Journal of Econometrics, Elsevier, vol. 116(1-2), pages 9-47.
    4. Chang, Jinyuan & Qiu, Yumou & Yao, Qiwei & Zou, Tao, 2018. "Confidence regions for entries of a large precision matrix," Journal of Econometrics, Elsevier, vol. 206(1), pages 57-82.
    5. Yuan Jiang & Yunxiao He & Heping Zhang, 2016. "Variable Selection With Prior Information for Generalized Linear Models via the Prior LASSO Method," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 355-376, March.
    6. Wang, Hansheng & Leng, Chenlei, 2008. "A note on adaptive group lasso," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5277-5286, August.
    7. Chiuso, Alessandro & Picci, Giorgio, 2004. "The asymptotic variance of subspace estimates," Journal of Econometrics, Elsevier, vol. 118(1-2), pages 257-291.
    8. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    9. Jiang, Fuwei & Lee, Joshua & Martin, Xiumin & Zhou, Guofu, 2019. "Manager sentiment and stock returns," Journal of Financial Economics, Elsevier, vol. 132(1), pages 126-149.
    10. Gerard Hoberg & Vojislav Maksimovic, 2015. "Redefining Financial Constraints: A Text-Based Analysis," The Review of Financial Studies, Society for Financial Studies, vol. 28(5), pages 1312-1352.
    11. Ali, Usman & Hirshleifer, David, 2020. "Shared analyst coverage: Unifying momentum spillover effects," Journal of Financial Economics, Elsevier, vol. 136(3), pages 649-675.
    12. Yang Zhou & Jianqing Fan & Lirong Xue, 2024. "How Much Can Machines Learn Finance from Chinese Text Data?," Management Science, INFORMS, vol. 70(12), pages 8962-8987, December.
    13. Chang, Jinyuan & Qiu, Yumou & Yao, Qiwei & Zou, Tao, 2018. "Confidence regions for entries of a large precision matrix," LSE Research Online Documents on Economics 87513, London School of Economics and Political Science, LSE Library.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kristoffer Pons Bertelsen, 2022. "The Prior Adaptive Group Lasso and the Factor Zoo," CREATES Research Papers 2022-05, Department of Economics and Business Economics, Aarhus University.
    2. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    3. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    4. Xing, Li-Min & Zhang, Yue-Jun, 2022. "Forecasting crude oil prices with shrinkage methods: Can nonconvex penalty and Huber loss help?," Energy Economics, Elsevier, vol. 110(C).
    5. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    6. Xue Wu & Chixiang Chen & Zheng Li & Lijun Zhang & Vernon M. Chinchilli & Ming Wang, 2024. "A three-stage approach to identify biomarker signatures for cancer genetic data with survival endpoints," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 33(3), pages 863-883, July.
    7. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    8. Justin B. Post & Howard D. Bondell, 2013. "Factor Selection and Structural Identification in the Interaction ANOVA Model," Biometrics, The International Biometric Society, vol. 69(1), pages 70-79, March.
    9. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    10. Jonathan Boss & Alexander Rix & Yin‐Hsiu Chen & Naveen N. Narisetty & Zhenke Wu & Kelly K. Ferguson & Thomas F. McElrath & John D. Meeker & Bhramar Mukherjee, 2021. "A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    11. Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
    12. Hu, Jianhua & Liu, Xiaoqian & Liu, Xu & Xia, Ningning, 2022. "Some aspects of response variable selection and estimation in multivariate linear regression," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    13. Caiya Zhang & Yanbiao Xiang, 2016. "On the oracle property of adaptive group Lasso in high-dimensional linear models," Statistical Papers, Springer, vol. 57(1), pages 249-265, March.
    14. Bang, Sungwan & Jhun, Myoungshic, 2012. "Simultaneous estimation and factor selection in quantile regression via adaptive sup-norm regularization," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 813-826.
    15. Guo, Xiao & Zhang, Hai & Wang, Yao & Wu, Jiang-Lun, 2015. "Model selection and estimation in high dimensional regression models with group SCAD," Statistics & Probability Letters, Elsevier, vol. 103(C), pages 86-92.
    16. Qian, Junhui & Su, Liangjun, 2016. "Shrinkage estimation of common breaks in panel data models via adaptive group fused Lasso," Journal of Econometrics, Elsevier, vol. 191(1), pages 86-109.
    17. Xianyi Wu & Xian Zhou, 2019. "On Hodges’ superefficiency and merits of oracle property in model selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(5), pages 1093-1119, October.
    18. Zhang Haixiang & Zheng Yinan & Yoon Grace & Zhang Zhou & Gao Tao & Joyce Brian & Zhang Wei & Schwartz Joel & Vokonas Pantel & Colicino Elena & Baccarelli Andrea & Hou Lifang & Liu Lei, 2017. "Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(3), pages 159-171, August.
    19. Zhao, Peixin & Xue, Liugen, 2010. "Variable selection for semiparametric varying coefficient partially linear errors-in-variables models," Journal of Multivariate Analysis, Elsevier, vol. 101(8), pages 1872-1883, September.
    20. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cam:camdae:2630. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Jake Dyer (email available below). General contact details of provider: https://www.econ.cam.ac.uk/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.