Leaderboard
MMedBench
a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.
a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.