LawBench | LLMWay – The Way To LLM

Leaderboard

LawBench

a benchmark designed to evaluate large language models in the legal domain.

Link

a benchmark designed to evaluate large language models in the legal domain.

Relevant Sites

SuperBench

a benchmark platform designed for evaluating large language models (LLMs) on a range of tasks, particularly focusing on their performance in different aspects such as natural language understanding, reasoning, and generalization.

WHOOPS!

a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.

Chatbot Arena Leaderboard

a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.

MixEval

a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.

SciBench

benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from domains like chemistry, physics, and mathematics.

Relevant Sites

Leave a Reply Cancel reply