IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003799.html
   My bibliography  Save this article

Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

Author

Listed:
  • David R Blair
  • Kanix Wang
  • Svetlozar Nestorov
  • James A Evans
  • Andrey Rzhetsky

Abstract

Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies.Author Summary: Automated systems that extract and integrate information from the research literature have become common in biomedicine. As the same meaning can be expressed in many distinct but synonymous ways, access to comprehensive thesauri may enable such systems to maximize their performance. Here, we establish the importance of synonymy for a specific text-mining task (named-entity normalization), and we suggest that current thesauri may be woefully inadequate in their documentation of this linguistic phenomenon. To test this claim, we develop a model for estimating the amount of missing synonymy. We apply our model to both biomedical terminologies and general-English thesauri, predicting massive amounts of missing synonymy for both lexicons. Furthermore, we verify some of our predictions for the latter domain through “crowd-sourcing.” Overall, our work highlights the dramatic incompleteness of current biomedical thesauri, and to mitigate this issue, we propose the creation of “living” terminologies, which would automatically harvest undocumented synonymy and help smart machines enrich biomedicine.

Suggested Citation

  • David R Blair & Kanix Wang & Svetlozar Nestorov & James A Evans & Andrey Rzhetsky, 2014. "Quantifying the Impact and Extent of Undocumented Biomedical Synonymy," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-17, September.
  • Handle: RePEc:plo:pcbi00:1003799
    DOI: 10.1371/journal.pcbi.1003799
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003799
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003799&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003799?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Robert C. Merton, 2005. "Theory of rational option pricing," World Scientific Book Chapters, in: Sudipto Bhattacharya & George M Constantinides (ed.), Theory Of Valuation, chapter 8, pages 229-288, World Scientific Publishing Co. Pte. Ltd..
    2. Paul S. Albert & Lisa M. McShane & Joanna H. Shih, 2001. "Latent Class Modeling Approaches for Assessing Diagnostic Error without a Gold Standard: With Applications to p53 Immunohistochemical Assays in Bladder Tumors," Biometrics, The International Biometric Society, vol. 57(2), pages 610-619, June.
    3. S�bastien Li-Thiao-T� & Daudin Jean-Jacques & Robin St�phane, 2012. "Bayesian model averaging for estimating the number of classes: applications to the total number of species in metagenomics," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(7), pages 1489-1504, January.
    4. Anne Chao & John Bunge, 2002. "Estimating the Number of Species in a Stochastic Abundance Model," Biometrics, The International Biometric Society, vol. 58(3), pages 531-539, September.
    5. Andrey Rzhetsky & Hagit Shatkay & W John Wilbur, 2009. "How to Get the Most out of Your Curation Effort," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-13, May.
    6. Black, Fischer & Scholes, Myron S, 1973. "The Pricing of Options and Corporate Liabilities," Journal of Political Economy, University of Chicago Press, vol. 81(3), pages 637-654, May-June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kau, James B. & Keenan, Donald C., 1999. "Patterns of rational default," Regional Science and Urban Economics, Elsevier, vol. 29(6), pages 765-785, November.
    2. Carol Alexandra & Leonardo M. Nogueira, 2005. "Optimal Hedging and Scale Inavriance: A Taxonomy of Option Pricing Models," ICMA Centre Discussion Papers in Finance icma-dp2005-10, Henley Business School, University of Reading, revised Nov 2005.
    3. Jun, Doobae & Ku, Hyejin, 2015. "Static hedging of chained-type barrier options," The North American Journal of Economics and Finance, Elsevier, vol. 33(C), pages 317-327.
    4. Vorst, A. C. F., 1988. "Option Pricing And Stochastic Processes," Econometric Institute Archives 272366, Erasmus University Rotterdam.
    5. Antoine Jacquier & Patrick Roome, 2015. "Black-Scholes in a CEV random environment," Papers 1503.08082, arXiv.org, revised Nov 2017.
    6. Boyarchenko, Svetlana & Levendorskii[caron], Sergei, 2007. "Optimal stopping made easy," Journal of Mathematical Economics, Elsevier, vol. 43(2), pages 201-217, February.
    7. Robert C. Merton, 2006. "Paul Samuelson and Financial Economics," The American Economist, Sage Publications, vol. 50(2), pages 9-31, October.
    8. Ammann, Manuel & Kind, Axel & Wilde, Christian, 2003. "Are convertible bonds underpriced? An analysis of the French market," Journal of Banking & Finance, Elsevier, vol. 27(4), pages 635-653, April.
    9. Sergio Zúñiga, 1999. "Modelos de Tasas de Interés en Chile: Una Revisión," Latin American Journal of Economics-formerly Cuadernos de Economía, Instituto de Economía. Pontificia Universidad Católica de Chile., vol. 36(108), pages 875-893.
    10. Zhijian (James) Huang & Yuchen Luo, 2016. "Revisiting Structural Modeling of Credit Risk—Evidence from the Credit Default Swap (CDS) Market," JRFM, MDPI, vol. 9(2), pages 1-20, May.
    11. José Martins & Rui Cunha Marques & Carlos Oliveira Cruz & Álvaro Fonseca, 2017. "Flexibility in planning and development of a container terminal: an application of an American-style call option," Transportation Planning and Technology, Taylor & Francis Journals, vol. 40(7), pages 828-840, October.
    12. Marcelo F. Perillo, 2021. "Valuación de Títulos de Deuda Indexados al Comportamiento de un Índice Accionario: Un Modelo sin Riesgo de Crédito," CEMA Working Papers: Serie Documentos de Trabajo. 784, Universidad del CEMA.
    13. Kartono, Agus & Solekha, Siti & Sumaryada, Tony & Irmansyah,, 2021. "Foreign currency exchange rate prediction using non-linear Schrödinger equations with economic fundamental parameters," Chaos, Solitons & Fractals, Elsevier, vol. 152(C).
    14. Jochen Bigus, 2002. "Investitionsanreize, Koalitionsverhalten und Gläubigerkonflikte," Schmalenbach Journal of Business Research, Springer, vol. 54(4), pages 317-342, June.
    15. Kim, Amy M. & Li, Huanan, 2020. "Incorporating the impacts of climate change in transportation infrastructure decision models," Transportation Research Part A: Policy and Practice, Elsevier, vol. 134(C), pages 271-287.
    16. René Garcia & Richard Luger & Eric Renault, 2000. "Asymmetric Smiles, Leverage Effects and Structural Parameters," Working Papers 2000-57, Center for Research in Economics and Statistics.
    17. Wang, Jun & Liang, Jin-Rong & Lv, Long-Jin & Qiu, Wei-Yuan & Ren, Fu-Yao, 2012. "Continuous time Black–Scholes equation with transaction costs in subdiffusive fractional Brownian motion regime," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(3), pages 750-759.
    18. George W. Kutner & James A. Seifert, 1989. "The Valuation of Mortgage Loan Commitments Using Option Pricing Estimates," Journal of Real Estate Research, American Real Estate Society, vol. 4(2), pages 13-20.
    19. Hilscher, Jens & Raviv, Alon, 2014. "Bank stability and market discipline: The effect of contingent capital on risk taking and default probability," Journal of Corporate Finance, Elsevier, vol. 29(C), pages 542-560.
    20. Andres, Christian & Cumming, Douglas & Karabiber, Timur & Schweizer, Denis, 2014. "Do markets anticipate capital structure decisions? — Feedback effects in equity liquidity," Journal of Corporate Finance, Elsevier, vol. 27(C), pages 133-156.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003799. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.