M3CoT | LLMWay – The Way To LLM

Leaderboard

M3CoT

a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural and social sciences, physical and social commonsense, temporal reasoning, algebra, and geometry.

Link

Relevant Sites

MMedBench

a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.

MathEval

a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.

LLMEval

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.

CompassRank

CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.

Relevant Sites

Leave a Reply Cancel reply