Leaderboard
DreamBench++
a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.
a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.
a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).