Evaluation | Site Categories | LLMWay

A framework for few-shot evaluation of language models.

A reliable click-and-go evaluation suite compatible with both open-source and proprietary models, supporting MixEval and other benchmarks.

lighteval 1,783

a lightweight LLM evaluation suite that Hugging Face has been using internally.

OLMO-eval 355

a repository for evaluating open language models.

instruct-eval 546

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

simple-evals 3,891

Eval tools by OpenAI.

Giskard 4,752

Testing & evaluation library for LLM applications, in particular RAGs

a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.

Ragas 10,155

a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.