A framework for few-shot evaluation of language models.
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
a repository for evaluating open language models.
a lightweight LLM evaluation suite that Hugging Face has been using internally.
A reliable click-and-go evaluation suite compatible with both open-source and proprietary models, supporting MixEval and other benchmarks.
a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.
Testing & evaluation library for LLM applications, in particular RAGs
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 17 - 11 = ?*
Save my name, email, and website in this browser for the next time I comment.
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.