IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v692y2026ics0378437126002815.html

Impacts of data splitting strategies on parameterized link prediction algorithms

Author

Listed:
  • Jiao, Xinshan
  • Luo, Yuxin
  • Bi, Yilin
  • Zhou, Tao

Abstract

Link prediction is a fundamental problem in network science, aiming to infer potential or missing links based on observed network structures. With the increasing adoption of parameterized models, the rigor of evaluation protocols has become critically important. However, a previously common practice of using the test set during hyperparameter tuning has led to human-induced information leakage, thereby inflating the reported model performance. To address this issue, this study introduces a novel evaluation metric, Loss Ratio, which quantitatively measures the extent of performance overestimation. We conduct large-scale experiments on 60 real-world networks across six domains. The results demonstrate that the information leakage leads to an average overestimation of about 3.6%, with the bias reaching over 15% for specific algorithms. Meanwhile, heuristic and random-walk-based methods exhibit greater robustness and stability. The analysis uncovers a pervasive information leakage issue in link prediction evaluation and underscores the necessity of adopting standardized data splitting strategies to enable fair and reproducible benchmarking of link prediction models.

Suggested Citation

  • Jiao, Xinshan & Luo, Yuxin & Bi, Yilin & Zhou, Tao, 2026. "Impacts of data splitting strategies on parameterized link prediction algorithms," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 692(C).
  • Handle: RePEc:eee:phsmap:v:692:y:2026:i:c:s0378437126002815
    DOI: 10.1016/j.physa.2026.131545
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437126002815
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2026.131545?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:692:y:2026:i:c:s0378437126002815. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.