LooGLE v2 | Long Dependency Benchmark

💰 Finance

⚖️ Law

💻 Code

🎮 Game

🏆 Leaderboard

Rank	Model	Overall	Finance	Law	Code	Game
Loading data...

📦 Installation

conda create -n loogle-v2 python=3.10
conda activate loogle-v2
pip install -r requirements.txt

# Flash Attention
pip install flash-attn==2.6.3 --no-build-isolation

📊 Dataset

# Clone from Hugging Face
git clone https://huggingface.co/datasets/MuLabPKU/LooGLE-v2 ./datasets/LooGLE-v2

# Or using HF CLI
hf download MuLabPKU/LooGLE-v2   --repo-type dataset  --local-dir ./datasets/LooGLE-v2

🛠️ Usage Workflow

1. Configuration (config/models.jsonl)

{
  "name": "your-model-name",
  "model": "path/to/model",
  "max_len": 131072,
  "base_url": "http://localhost:8000/v1",
  "api_key": "your-api-key"
}

2. (Optional) Pre-compute RAG Contexts

python rag_preprocess.py \
  --input_path ./datasets/LooGLE-v2 \
  --split test \
  --output_path ./datasets/LooGLE-v2/test_rag.jsonl \
  --embedding_model THUDM/LongCite-glm4-9b \
  --devices 0,1

3. Running Predictions

Choose your preferred inference method:

Launch vLLM server first:

python -m vllm.entrypoints.openai.api_server \
  --model path/to/your/model \
  --port 8000 \
  --max-model-len 131072

Then run prediction:

python predict.py \
  --model your-model-name \
  --data_dir ./datasets/LooGLE-v2 \
  --save_dir ./results \
  --max_new_tokens 512

Run local inference:

python predict_transformers.py \
  --model your-model-name \
  --data_dir ./datasets/LooGLE-v2 \
  --save_dir ./results \
  --max_new_tokens 512 \
  --load_in_4bit  # Optional quantization

📈 Evaluation

# Evaluate single file
python evaluate.py --input_path ./results/your-model-name.jsonl

# Batch evaluation
python evaluate.py --input_path ./results --batch --output_json summary.json

📖 Citation

@article{he2025loogle,
  title={LooGLE v2: Are LLMs Ready for Real World Long Dependency Challenges?},
  author={He, Ziyuan and Wang, Yuxuan and Li, Jiaqi and Liang, Kexin and Zhang, Muhan},
  journal={arXiv preprint arXiv:2510.22548},
  year={2025}
}