OlympicArena | LLMWay – The Way To LLM

Leaderboard

OlympicArena

a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.

Link

a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.

Relevant Sites

We-Math

a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.

SciBench

benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from domains like chemistry, physics, and mathematics.

PubMedQA

a biomedical question-answering benchmark designed for answering research-related questions using PubMed abstracts.

FELM

a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).

DreamBench++

a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.

Relevant Sites

Leave a Reply Cancel reply