an expert-driven benchmark for Chineses LLMs.
a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.
a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.
CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.
A Challenging, Contamination-Free LLM Benchmark.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 18 + 17 = ?*
Save my name, email, and website in this browser for the next time I comment.
a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.