instruct-eval | LLMWay – The Way To LLM

Evaluation

instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

GitHub

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.

Ragas 11,585

a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.

lm-evaluation-harness 10,787

A framework for few-shot evaluation of language models.

OLMO-eval 370

a repository for evaluating open language models.

lighteval 2,150

a lightweight LLM evaluation suite that Hugging Face has been using internally.

simple-evals 4,195

Eval tools by OpenAI.