IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v5y2022i4p79-1320d1001711.html
   My bibliography  Save this article

Extracting Proceedings Data from Court Cases with Machine Learning

Author

Listed:
  • Bruno Mathis

    (CHROME Laboratory, Nimes University, 5 Rue du Docteur Georges Salan CS 13019, 30021 Nîmes, France
    European Centre of Law & Economics of ESSEC Business School, 3 Av. Bernard Hirsch, 95000 Cergy, France)

Abstract

France is rolling out an open data program for all court cases, but with few metadata attached. Reusers will have to use named-entity recognition (NER) within the text body of the case to extract any value from it. Any court case may include up to 26 variables, or labels, that are related to the proceeding, regardless of the case substance. These labels are from different syntactic types: some of them are rare; others are ubiquitous. This experiment compares different algorithms, namely CRF, SpaCy, Flair and DeLFT, to extract proceedings data and uses the learning model assessment capabilities of Kairntech, an NLP platform. It shows that an NER model can apply to this large and diverse set of labels and extract data of high quality. We achieved an 87.5% F1 measure with Flair trained on more than 27,000 manual annotations. Quality may yet be improved by combining NER models by data type.

Suggested Citation

  • Bruno Mathis, 2022. "Extracting Proceedings Data from Court Cases with Machine Learning," Stats, MDPI, vol. 5(4), pages 1-16, December.
  • Handle: RePEc:gam:jstats:v:5:y:2022:i:4:p:79-1320:d:1001711
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/5/4/79/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/5/4/79/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Daniel Martin Katz & Michael J Bommarito II & Josh Blackman, 2017. "A general approach for predicting the behavior of the Supreme Court of the United States," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-18, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Anthony Niblett, 2018. "Regulatory Reform in Ontario: Machine Learning and Regulation," C.D. Howe Institute Commentary, C.D. Howe Institute, issue 507, March.
    2. Prof. Dr.Sejdi Rexhepi & Mjellma Kadriu, 2018. "The Importance of Resource Assessment for Entrepreneurship and Local Economic Development in Kosovo," European Journal of Economics and Business Studies Articles, Revistia Research and Publishing, vol. 4, January -.
    3. Amedeo Santosuosso & Giulia Pinotti, 2020. "Bottleneck or Crossroad? Problems of Legal Sources Annotation and Some Theoretical Thoughts," Stats, MDPI, vol. 3(3), pages 1-20, September.
    4. Alain Marciano & Antonio Nicita & Giovanni Battista Ramello, 2020. "Big data and big techs: understanding the value of information in platform capitalism," European Journal of Law and Economics, Springer, vol. 50(3), pages 345-358, December.
    5. Ulenaers Jasper, 2020. "The Impact of Artificial Intelligence on the Right to a Fair Trial: Towards a Robot Judge?," Asian Journal of Law and Economics, De Gruyter, vol. 11(2), pages 1, August.
    6. Zhong, Weifeng & Chan, Julian, 2020. "Predicting Authoritarian Crackdowns: A Machine Learning Approach," Working Papers 10464, George Mason University, Mercatus Center.
    7. Bălan Carmen, 2018. "The Impact of Conversational Agents on Humans in Services: Research Questions and Hypotheses," International Conference on Marketing and Business Development Journal, The Bucharest University of Economic Studies, vol. 1(2), pages 33-55, December.
    8. So-Hui Park & Dong-Gu Lee & Jin-Sung Park & Jun-Woo Kim, 2021. "A Survey of Research on Data Analytics-Based Legal Tech," Sustainability, MDPI, vol. 13(14), pages 1-24, July.
    9. , Aisdl, 2020. "Becoming Attuned," OSF Preprints j7f8y, Center for Open Science.
    10. Bokwon Lee & Kyu-Min Lee & Jae-Suk Yang, 2019. "Network structure reveals patterns of legal complexity in human society: The case of the Constitutional legal network," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-15, January.
    11. Giansiracusa, Noah & Ricciardi, Cameron, 2019. "Computational geometry and the U.S. Supreme Court," Mathematical Social Sciences, Elsevier, vol. 98(C), pages 1-9.
    12. Daniyal Alghazzawi & Omaimah Bamasag & Aiiad Albeshri & Iqra Sana & Hayat Ullah & Muhammad Zubair Asghar, 2022. "Efficient Prediction of Court Judgments Using an LSTM+CNN Neural Network Model with an Optimal Feature Set," Mathematics, MDPI, vol. 10(5), pages 1-30, February.
    13. Yang, Guancan & Lu, Guoxuan & Xu, Shuo & Chen, Liang & Wen, Yuxin, 2023. "Which type of dynamic indicators should be preferred to predict patent commercial potential?," Technological Forecasting and Social Change, Elsevier, vol. 193(C).
    14. Mindock, Maxwell R. & Waddell, Glen R., 2019. "Vote Influence in Group Decision-Making: The Changing Role of Justices' Peers on the Supreme Court," IZA Discussion Papers 12317, Institute of Labor Economics (IZA).
    15. Zhong, Weifeng & Chan, Julian & Ho, Kwan-Yuet & Lee, Kit, 2020. "Words Speak Louder Than Numbers: Estimating China’s COVID Severity with Deep Learning," Working Papers 10955, George Mason University, Mercatus Center.
    16. Frederike Zufall & Rampei Kimura & Linyu Peng, 2021. "Towards a simple mathematical model for the legal concept of balancing of interests," Discussion Paper Series of the Max Planck Institute for Research on Collective Goods 2021_09, Max Planck Institute for Research on Collective Goods, revised 19 Oct 2021.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:5:y:2022:i:4:p:79-1320:d:1001711. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.