Leaderboard
MMToM-QA
a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.
a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.
a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.