Leaderboard
VisualWebArena
a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
aims to track, rank, and evaluate LLMs and chatbots as they are released.