a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
A Challenging, Contamination-Free LLM Benchmark.
aims to track, rank, and evaluate LLMs and chatbots as they are released.
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
an evaluation benchmark focused on ancient Chinese language comprehension.
A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.
Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Part of a foundational system, it serves as a bedrock for innovation in the global community. A few key aspects: Open access: Easy accessibility to cutting-edge large language models, fostering […]
Qwen2.5 is a series of large language models developed by the Alibaba Cloud Intelligence team, designed to provide powerful natural language processing capabilities. Here are some key features and advantages of the product: Model Scale: The Qwen2.5 series includes multiple model scales, ranging from 0.5B to 72B parameters, catering to different scenarios and needs. Pre-training […]
DeepSeek-V3, a powerful Mixture-of-Experts (MoE) language model with a total of 671B parameters and 37B parameters activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts the Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which have been thoroughly verified in DeepSeek-V2. Moreover, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets […]
(2025-1) DeepSeek-R1 by DeepSeek
(2024-12) Qwen2.5 by Alibaba
(2024-05) Llama3 by Meta
(2024-05) Mamba2 by CMU&Princeton
(2024-01) DeepSeek-v2 by DeepSeek
(2023-12) Mamba by CMU&Princeton
(2023-10) Mistral 7B by Mistral
(2023-07) LLaMA2 by Meta
(2023-05) ToT by Google&Princeton
A high-throughput and memory-efficient inference and serving engine for LLMs.
SGLang is a fast serving framework for large language models and vision language models.
a toolkit for deploying and serving Large Language Models (LLMs).
A high-throughput and low-latency inference and serving framework for LLMs and VLs
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.
LLM inference in C/C++.
Open Source LLM Engineering Platform 🪢 Tracing, Evaluations, Prompt Management, Evaluations and Playground.
veRL is a flexible and efficient RL framework for LLMs.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
A Native-PyTorch Library for LLM Fine-tuning.
A native PyTorch Library for large model training.
Generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains.
Ongoing research training transformer models at scale.
Making large AI models cheaper, faster, and more accessible.
A framework for few-shot evaluation of language models.
A reliable click-and-go evaluation suite compatible with both open-source and proprietary models, supporting MixEval and other benchmarks.
a lightweight LLM evaluation suite that Hugging Face has been using internally.
a repository for evaluating open language models.
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
Eval tools by OpenAI.
Testing & evaluation library for LLM applications, in particular RAGs
a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.
a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.