Chinese Large Model Leaderboard | LLMWay

Leaderboard

Chinese Large Model Leaderboard

an expert-driven benchmark for Chineses LLMs.

Link

an expert-driven benchmark for Chineses LLMs.

Relevant Sites

DreamBench++

a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.

InfiBench

a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.

MixEval

a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.

LLMEval

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.

Relevant Sites

Leave a Reply Cancel reply