Leaderboard
WHOOPS!
a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.
a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.
a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such as argumentation analysis, semantic similarity, and textual entailment.