Leaderboard
MathEval
a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.
a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.
a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.