Leaderboard
AlpacaEval
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.