Leaderboard
We-Math
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly in financial reports.