lm-evaluation-harness | LLMWay

Evaluation

lm-evaluation-harness

A framework for few-shot evaluation of language models.

GitHub

A framework for few-shot evaluation of language models.

Giskard 4,916

Testing & evaluation library for LLM applications, in particular RAGs

simple-evals 4,096

Eval tools by OpenAI.

Ragas 11,014

a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.

MixEval 250

A reliable click-and-go evaluation suite compatible with both open-source and proprietary models, supporting MixEval and other benchmarks.

OLMO-eval 364

a repository for evaluating open language models.

lighteval 1,988

a lightweight LLM evaluation suite that Hugging Face has been using internally.