A benchmark for LLM
a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.
an expert-driven benchmark for Chineses LLMs.
a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking while running locally and quickly.
a large-scale question-answering benchmark focused on real-world financial data, integrating both tabular and textual information.
aims to track, rank, and evaluate LLMs and chatbots as they are released.
A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 16 - 19 = ?*
Save my name, email, and website in this browser for the next time I comment.
a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.