Leaderboard
MixEval
a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.
a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.
a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly in financial reports.