Leaderboard
OlympicArena
a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.
a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.
a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such as argumentation analysis, semantic similarity, and textual entailment.