Leaderboard
AlpacaEval
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.