Leaderboard
MixEval
a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.
a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.
a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.