IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2510.05710.html

FinReflectKG - EvalBench: Benchmarking Financial KG with Multi-Dimensional Evaluation

Author

Listed:
  • Fabrizio Dimino
  • Abhinav Arun
  • Bhaskarjit Sarmah
  • Stefano Pasquali

Abstract

Large language models (LLMs) are increasingly being used to extract structured knowledge from unstructured financial text. Although prior studies have explored various extraction methods, there is no universal benchmark or unified evaluation framework for the construction of financial knowledge graphs (KG). We introduce FinReflectKG - EvalBench, a benchmark and evaluation framework for KG extraction from SEC 10-K filings. Building on the agentic and holistic evaluation principles of FinReflectKG - a financial KG linking audited triples to source chunks from S&P 100 filings and supporting single-pass, multi-pass, and reflection-agent-based extraction modes - EvalBench implements a deterministic commit-then-justify judging protocol with explicit bias controls, mitigating position effects, leniency, verbosity and world-knowledge reliance. Each candidate triple is evaluated with binary judgments of faithfulness, precision, and relevance, while comprehensiveness is assessed on a three-level ordinal scale (good, partial, bad) at the chunk level. Our findings suggest that, when equipped with explicit bias controls, LLM-as-Judge protocols provide a reliable and cost-efficient alternative to human annotation, while also enabling structured error analysis. Reflection-based extraction emerges as the superior approach, achieving best performance in comprehensiveness, precision, and relevance, while single-pass extraction maintains the highest faithfulness. By aggregating these complementary dimensions, FinReflectKG - EvalBench enables fine-grained benchmarking and bias-aware evaluation, advancing transparency and governance in financial AI applications.

Suggested Citation

  • Fabrizio Dimino & Abhinav Arun & Bhaskarjit Sarmah & Stefano Pasquali, 2025. "FinReflectKG - EvalBench: Benchmarking Financial KG with Multi-Dimensional Evaluation," Papers 2510.05710, arXiv.org.
  • Handle: RePEc:arx:papers:2510.05710
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2510.05710
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Abhinav Arun & Fabrizio Dimino & Tejas Prakash Agarwal & Bhaskarjit Sarmah & Stefano Pasquali, 2025. "FinReflectKG: Agentic Construction and Evaluation of Financial Knowledge Graphs," Papers 2508.17906, arXiv.org, revised Oct 2025.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Abhinav Arun & Reetu Raj Harsh & Bhaskarjit Sarmah & Stefano Pasquali, 2025. "FinReflectKG -- MultiHop: Financial QA Benchmark for Reasoning with Knowledge Graph Evidence," Papers 2510.02906, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2510.05710. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.