Leaderboard
MixEval
a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.
a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.
a benchmark designed to evaluate large language models in the legal domain.