Finance
Law
Code
Game
🏆 Leaderboard
| Rank | Model | Overall | Finance | Law | Code | Game |
|---|---|---|---|---|---|---|
| Loading data... | ||||||
📦 Installation
conda create -n loogle-v2 python=3.10 conda activate loogle-v2 pip install -r requirements.txt # Flash Attention pip install flash-attn==2.6.3 --no-build-isolation
📊 Dataset
# Clone from Hugging Face git clone https://huggingface.co/datasets/MuLabPKU/LooGLE-v2 ./datasets/LooGLE-v2 # Or using HF CLI hf download MuLabPKU/LooGLE-v2 --repo-type dataset --local-dir ./datasets/LooGLE-v2
🛠️ Usage Workflow
1. Configuration (config/models.jsonl)
{
"name": "your-model-name",
"model": "path/to/model",
"max_len": 131072,
"base_url": "http://localhost:8000/v1",
"api_key": "your-api-key"
}
2. (Optional) Pre-compute RAG Contexts
python rag_preprocess.py \ --input_path ./datasets/LooGLE-v2 \ --split test \ --output_path ./datasets/LooGLE-v2/test_rag.jsonl \ --embedding_model THUDM/LongCite-glm4-9b \ --devices 0,1
3. Running Predictions
Choose your preferred inference method:
Launch vLLM server first:
python -m vllm.entrypoints.openai.api_server \ --model path/to/your/model \ --port 8000 \ --max-model-len 131072
Then run prediction:
python predict.py \ --model your-model-name \ --data_dir ./datasets/LooGLE-v2 \ --save_dir ./results \ --max_new_tokens 512
Run local inference:
python predict_transformers.py \ --model your-model-name \ --data_dir ./datasets/LooGLE-v2 \ --save_dir ./results \ --max_new_tokens 512 \ --load_in_4bit # Optional quantization
📈 Evaluation
# Evaluate single file python evaluate.py --input_path ./results/your-model-name.jsonl # Batch evaluation python evaluate.py --input_path ./results --batch --output_json summary.json
📖 Citation
@article{he2025loogle,
title={LooGLE v2: Are LLMs Ready for Real World Long Dependency Challenges?},
author={He, Ziyuan and Wang, Yuxuan and Li, Jiaqi and Liang, Kexin and Zhang, Muhan},
journal={arXiv preprint arXiv:2510.22548},
year={2025}
}