Leaderboard
OlympicArena
a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.
a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.