Leaderboard
Berkeley Function-Calling Leaderboard
evaluates LLM's ability to call external functions/tools.
evaluates LLM's ability to call external functions/tools.
a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.