MMToM-QA | LLMWay – The Way To LLM

Leaderboard

MMToM-QA

a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.

Link

a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.

a large-scale question-answering benchmark focused on real-world financial data, integrating both tabular and textual information.

a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.

a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.