Leaderboard
We-Math
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.