TAT-DQA | LLMWay – The Way To LLM

Leaderboard

TAT-DQA

a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly in financial reports.

Link

a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly in financial reports.

Relevant Sites

M3CoT

a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural and social sciences, physical and social commonsense, temporal reasoning, algebra, and geometry.

InfiBench

a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.

FELM

a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).

WHOOPS!

a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.

Chatbot Arena Leaderboard

a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.

Relevant Sites

Leave a Reply Cancel reply