Leaderboard
AlpacaEval
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
evaluates LLM's ability to call external functions/tools.