LiveBench | LLMWay – The Way To LLM

Leaderboard

LiveBench

A Challenging, Contamination-Free LLM Benchmark.

Link

A Challenging, Contamination-Free LLM Benchmark.

a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.

a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.

a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes).

a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).

a biomedical question-answering benchmark designed for answering research-related questions using PubMed abstracts.