an expert-driven benchmark for Chineses LLMs.
A benchmark for LLM
A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.
a biomedical question-answering benchmark designed for answering research-related questions using PubMed abstracts.
a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.
evaluates LLM's ability to call external functions/tools.
a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such as argumentation analysis, semantic similarity, and textual entailment.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 17 - 12 = ?*
Save my name, email, and website in this browser for the next time I comment.
A benchmark for LLM