Leaderboard
LLMEval
focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.
focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.