ACLUE | LLMWay – The Way To LLM

Leaderboard

ACLUE

an evaluation benchmark focused on ancient Chinese language comprehension.

Link

an evaluation benchmark focused on ancient Chinese language comprehension.

a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.

a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.

a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.