Welcome to RLHN

RLHN (ReLabeing Hard Negatives) uses a cascading LLM framework to identify and relabel false negatives in IR training datasets.

This repository contains training datasets curated by RLHN & models fine-tuned on these curated datasets.

List of Contributors:

Nandan Thakur*
Crystina Zhang*
Xueguang Ma
Jimmy Lin

Preprint URL: https://huggingface.co/papers/2505.16967

Citation

@misc{thakur2025rlhn,
      title={Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval}, 
      author={Nandan Thakur and Crystina Zhang and Xueguang Ma and Jimmy Lin},
      year={2025},
      eprint={2505.16967},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2505.16967}, 
}