IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2207.04368.html
   My bibliography  Save this paper

Supervised similarity learning for corporate bonds using Random Forest proximities

Author

Listed:
  • Jerinsh Jeyapaulraj
  • Dhruv Desai
  • Peter Chu
  • Dhagash Mehta
  • Stefano Pasquali
  • Philip Sommer

Abstract

Financial literature consists of ample research on similarity and comparison of financial assets and securities such as stocks, bonds, mutual funds, etc. However, going beyond correlations or aggregate statistics has been arduous since financial datasets are noisy, lack useful features, have missing data and often lack ground truth or annotated labels. However, though similarity extrapolated from these traditional models heuristically may work well on an aggregate level, such as risk management when looking at large portfolios, they often fail when used for portfolio construction and trading which require a local and dynamic measure of similarity on top of global measure. In this paper we propose a supervised similarity framework for corporate bonds which allows for inference based on both local and global measures. From a machine learning perspective, this paper emphasis that random forest (RF), which is usually viewed as a supervised learning algorithm, can also be used as a similarity learning (more specifically, a distance metric learning) algorithm. In addition, this framework proposes a novel metric to evaluate similarities, and analyses other metrics which further demonstrate that RF outperforms all other methods experimented with, in this work.

Suggested Citation

  • Jerinsh Jeyapaulraj & Dhruv Desai & Peter Chu & Dhagash Mehta & Stefano Pasquali & Philip Sommer, 2022. "Supervised similarity learning for corporate bonds using Random Forest proximities," Papers 2207.04368, arXiv.org, revised Oct 2022.
  • Handle: RePEc:arx:papers:2207.04368
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2207.04368
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. John R. J. Thompson & Longlong Feng & R. Mark Reesor & Chuck Grace, 2021. "Know Your Clients’ Behaviours: A Cluster Analysis of Financial Transactions," JRFM, MDPI, vol. 14(2), pages 1-29, January.
    2. Miceli, M.A. & Susinno, G., 2004. "Ultrametricity in fund of funds diversification," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 344(1), pages 95-99.
    3. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    4. Vipul Satone & Dhruv Desai & Dhagash Mehta, 2021. "Fund2Vec: Mutual Funds Similarity using Graph Learning," Papers 2106.12987, arXiv.org.
    5. Rajna Gibson & Sébastien Gyger, 2007. "The Style Consistency of Hedge Funds," European Financial Management, European Financial Management Association, vol. 13(2), pages 287-308, March.
    6. Nandita Das, 2003. "Hedge Fund Classification using K-means Clustering Method," Computing in Economics and Finance 2003 284, Society for Computational Economics.
    7. Cynthia Pagliaro & Dhagash Mehta & Han-Tai Shiao & Shaofei Wang & Luwei Xiong, 2021. "Investor Behavior Modeling by Analyzing Financial Advisor Notes: A Machine Learning Perspective," Papers 2107.05592, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dhruv Desai & Ashmita Dhiman & Tushar Sharma & Deepika Sharma & Dhagash Mehta & Stefano Pasquali, 2023. "Quantifying Outlierness of Funds from their Categories using Supervised Similarity," Papers 2308.06882, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dhagash Mehta & Dhruv Desai & Jithin Pradeep, 2020. "Machine Learning Fund Categorizations," Papers 2006.00123, arXiv.org.
    2. Dhruv Desai & Ashmita Dhiman & Tushar Sharma & Deepika Sharma & Dhagash Mehta & Stefano Pasquali, 2023. "Quantifying Outlierness of Funds from their Categories using Supervised Similarity," Papers 2308.06882, arXiv.org.
    3. Darolles, Serge & Gourieroux, Christian, 2010. "Conditionally fitted Sharpe performance with an application to hedge fund rating," Journal of Banking & Finance, Elsevier, vol. 34(3), pages 578-593, March.
    4. Goldstein Benjamin A & Polley Eric C & Briggs Farren B. S., 2011. "Random Forests for Genetic Association Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-34, July.
    5. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    6. Uri Kartoun, 2013. "A Method for Comparing Hedge Funds," Papers 1303.0073, arXiv.org, revised Mar 2013.
    7. Harald Oberhofer & Marian Schwinner, 2017. "Do Individual Salaries Depend On the Performance of the Peers? Prototype Heuristic and Wage Bargaining in the NBA," WIFO Working Papers 534, WIFO.
    8. Sexton, Joseph & Laake, Petter, 2009. "Standard errors for bagged and random forest estimators," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 801-811, January.
    9. Conlon, T. & Ruskin, H.J. & Crane, M., 2007. "Random matrix theory and fund of funds portfolio optimisation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 382(2), pages 565-576.
    10. Joshua Rosaler & Dhruv Desai & Bhaskarjit Sarmah & Dimitrios Vamvourellis & Deran Onay & Dhagash Mehta & Stefano Pasquali, 2023. "Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach," Papers 2310.12428, arXiv.org.
    11. Mendez, Guillermo & Lohr, Sharon, 2011. "Estimating residual variance in random forest regression," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2937-2950, November.
    12. R. Gibson Brandon & S. Gyger, 2011. "Optimal hedge fund portfolios under liquidation risk," Quantitative Finance, Taylor & Francis Journals, vol. 11(1), pages 53-67.
    13. Rian Dolphin & Barry Smyth & Ruihai Dong, 2022. "A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets," Papers 2211.06378, arXiv.org.
    14. Li, Yiliang & Bai, Xiwen & Wang, Qi & Ma, Zhongjun, 2022. "A big data approach to cargo type prediction and its implications for oil trade estimation," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 165(C).
    15. Paolo Giudici & Gloria Polinesi, 2021. "Crypto price discovery through correlation networks," Annals of Operations Research, Springer, vol. 299(1), pages 443-457, April.
    16. Yi Fu & Shuai Cao & Tao Pang, 2020. "A Sustainable Quantitative Stock Selection Strategy Based on Dynamic Factor Adjustment," Sustainability, MDPI, vol. 12(10), pages 1-12, May.
    17. Ishwaran, Hemant & Kogalur, Udaya B., 2010. "Consistency of random survival forests," Statistics & Probability Letters, Elsevier, vol. 80(13-14), pages 1056-1064, July.
    18. José María Sarabia & Faustino Prieto & Vanesa Jordá & Stefan Sperlich, 2020. "A Note on Combining Machine Learning with Statistical Modeling for Financial Data Analysis," Risks, MDPI, vol. 8(2), pages 1-14, April.
    19. Biau, Gérard & Devroye, Luc, 2010. "On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification," Journal of Multivariate Analysis, Elsevier, vol. 101(10), pages 2499-2518, November.
    20. Olivier BIAU & Angela D´ELIA, 2010. "Euro Area GDP Forecast Using Large Survey Dataset - A Random Forest Approach," EcoMod2010 259600029, EcoMod.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2207.04368. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.