IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2307.09332.html
   My bibliography  Save this paper

Company2Vec -- German Company Embeddings based on Corporate Websites

Author

Listed:
  • Christopher Gerling

Abstract

With Company2Vec, the paper proposes a novel application in representation learning. The model analyzes business activities from unstructured company website data using Word2Vec and dimensionality reduction. Company2Vec maintains semantic language structures and thus creates efficient company embeddings in fine-granular industries. These semantic embeddings can be used for various applications in banking. Direct relations between companies and words allow semantic business analytics (e.g. top-n words for a company). Furthermore, industry prediction is presented as a supervised learning application and evaluation method. The vectorized structure of the embeddings allows measuring companies similarities with the cosine distance. Company2Vec hence offers a more fine-grained comparison of companies than the standard industry labels (NACE). This property is relevant for unsupervised learning tasks, such as clustering. An alternative industry segmentation is shown with k-means clustering on the company embeddings. Finally, this paper proposes three algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric peer-firm identification.

Suggested Citation

  • Christopher Gerling, 2023. "Company2Vec -- German Company Embeddings based on Corporate Websites," Papers 2307.09332, arXiv.org.
  • Handle: RePEc:arx:papers:2307.09332
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2307.09332
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Samuel Ronnqvist & Peter Sarlin, 2014. "Bank Networks from Text: Interrelations, Centrality and Determinants," Papers 1406.7752, arXiv.org, revised Jul 2015.
    2. Gerard Hoberg & Gordon Phillips, 2016. "Text-Based Network Industries and Endogenous Product Differentiation," Journal of Political Economy, University of Chicago Press, vol. 124(5), pages 1423-1465.
    3. Lee, Charles M.C. & Ma, Paul & Wang, Charles C.Y., 2015. "Search-based peer firms: Aggregating investor perceptions through internet co-searches," Journal of Financial Economics, Elsevier, vol. 116(2), pages 410-431.
    4. Samuel R�nnqvist & Peter Sarlin, 2015. "Bank networks from text: interrelations, centrality and determinants," Quantitative Finance, Taylor & Francis Journals, vol. 15(10), pages 1619-1635, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Christopher Gerling & Stefan Lessmann, 2023. "Multimodal Document Analytics for Banking Process Automation," Papers 2307.11845, arXiv.org, revised Nov 2023.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dimitrios Vamvourellis & M'at'e Toth & Snigdha Bhagat & Dhruv Desai & Dhagash Mehta & Stefano Pasquali, 2023. "Company Similarity using Large Language Models," Papers 2308.08031, arXiv.org.
    2. Xi Zhang & Jiawei Shi & Di Wang & Binxing Fang, 2018. "Exploiting Investors Social Network for Stock Prediction in China's Market," Papers 1801.00597, arXiv.org.
    3. Andy Fodor & Randy D. Jorgensen & John D. Stowe, 2021. "Financial clusters, industry groups, and stock return correlations," Journal of Financial Research, Southern Finance Association;Southwestern Finance Association, vol. 44(1), pages 121-144, April.
    4. Ge, S., 2020. "Text-Based Linkages and Local Risk Spillovers in the Equity Market," Cambridge Working Papers in Economics 20115, Faculty of Economics, University of Cambridge.
    5. Chen, Zilin & Guo, Li & Tu, Jun, 2021. "Media connection and return comovement," Journal of Economic Dynamics and Control, Elsevier, vol. 130(C).
    6. Aobdia, Daniel & Cheng, Lin, 2018. "Unionization, product market competition, and strategic disclosure," Journal of Accounting and Economics, Elsevier, vol. 65(2), pages 331-357.
    7. Samuel Ronnqvist & Peter Sarlin, 2015. "Detect & Describe: Deep learning of bank stress in the news," Papers 1507.07870, arXiv.org.
    8. Lee, Charles M.C. & Sun, Stephen Teng & Wang, Rongfei & Zhang, Ran, 2019. "Technological links and predictable returns," Journal of Financial Economics, Elsevier, vol. 132(3), pages 76-96.
    9. Zheng, Hannan & Schwenkler, Gustavo, 2020. "The network of firms implied by the news," ESRB Working Paper Series 108, European Systemic Risk Board.
    10. Samuel Ronnqvist & Peter Sarlin, 2016. "Bank distress in the news: Describing events through deep learning," Papers 1603.05670, arXiv.org, revised Dec 2016.
    11. Fang, Libing & Sun, Boyang & Li, Huijing & Yu, Honghai, 2018. "Systemic risk network of Chinese financial institutions," Emerging Markets Review, Elsevier, vol. 35(C), pages 190-206.
    12. Liu, Wei & Ma, Qianting & Liu, Xiaoxing, 2022. "Research on the dynamic evolution and its influencing factors of stock correlation network in the Chinese new energy market," Finance Research Letters, Elsevier, vol. 45(C).
    13. Zhibin Niu & Junqi Wu & Dawei Cheng & Jiawan Zhang, 2021. "Regshock: Interactive Visual Analytics of Systemic Risk in Financial Networks," Papers 2104.11863, arXiv.org.
    14. Zhibin Niu & Runlin Li & Junqi Wu & Dawei Cheng & Jiawan Zhang, 2020. "iConViz: Interactive Visual Exploration of the Default Contagion Risk of Networked-Guarantee Loans," Papers 2006.09542, arXiv.org, revised Aug 2020.
    15. Jonathan Ross & David Ziebart & Anthony Meder, 2019. "A new measure of firm-group accounting closeness," Review of Quantitative Finance and Accounting, Springer, vol. 52(4), pages 1137-1161, May.
    16. Keongtae Kim & Anandasivam Gopal & Gerard Hoberg, 2016. "Does Product Market Competition Drive CVC Investment? Evidence from the U.S. IT Industry," Information Systems Research, INFORMS, vol. 27(2), pages 259-281, June.
    17. Bernard, Darren & Blackburne, Terrence & Thornock, Jacob, 2020. "Information flows among rivals and corporate investment," Journal of Financial Economics, Elsevier, vol. 136(3), pages 760-779.
    18. Paul Brockman & Dennis Y Chung & Neal M Snow, 2023. "Search-Based Peer Groups and Commonality in Liquidity," Review of Finance, European Finance Association, vol. 27(1), pages 33-77.
    19. Thomas Forss & Peter Sarlin, 2017. "News-sentiment networks as a risk indicator," Papers 1706.05812, arXiv.org, revised May 2018.
    20. Li, Jingyu & Li, Jianping & Zhu, Xiaoqian, 2020. "Risk dependence between energy corporations: A text-based measurement approach," International Review of Economics & Finance, Elsevier, vol. 68(C), pages 33-46.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2307.09332. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.