Leaderboard
WHOOPS!
a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.
a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.
a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.