Leaderboard
VisualWebArena
a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.