MMedBench | LLMWay – The Way To LLM

Leaderboard

MMedBench

a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.

Link

a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.

Relevant Sites

MixEval

a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.

WHOOPS!

a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.

SciBench

benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from domains like chemistry, physics, and mathematics.

PubMedQA

a biomedical question-answering benchmark designed for answering research-related questions using PubMed abstracts.

Relevant Sites

Leave a Reply Cancel reply