Inference Engines | Site Categories | LLMWay

vLLM 57,880

A high-throughput and memory-efficient inference and serving engine for LLMs.

SGLang 17,877

SGLang is a fast serving framework for large language models and vision language models.

LMDeploy 7,034

A high-throughput and low-latency inference and serving framework for LLMs and VLs

ollama 152,237

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Nanoflow 887

NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.

llama.cpp 86,420

LLM inference in C/C++.

Langfuse 16,073

Open Source LLM Engineering Platform 🪢 Tracing, Evaluations, Prompt Management, Evaluations and Playground.

FastChat 39,085

A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.

mistral.rs 6,077

Blazingly fast LLM inference.

MindSQL 422

A python package for Txt-to-SQL with self hosting functionalities and RESTful APIs compatible with proprietary as well as open source LLM.

SkyPilot 8,715

Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.

Haystack

an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.

exllama 2,897

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

QA-Pilot 306

An interactive chat project that leverages Ollama/OpenAI/MistralAI LLMs for rapid understanding and navigation of GitHub code repository or compressed file resources.

Shell-Pilot 107

Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies.

LangChain 115,362

Building applications with LLMs through composability

Floom 44

AI gateway and marketplace for developers, enables streamlined integration of AI features into products

Swiss Army Llama 1,029

Comprehensive set of tools for working with local LLMs for various tasks.

LiteChain 419

Lightweight alternative to LangChain for composing LLMs

magentic 2,365

Seamlessly integrate LLMs as Python functions

wechat-chatgpt 13,302

Use ChatGPT On Wechat via wechaty

promptfoo 8,361

Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.

Agenta 3,151

Easily build, version, evaluate and deploy your LLM-powered apps.

Serge 5,750

a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!

Langroid 3,693

Harness LLMs with Multi-Agent Programming

Embedchain 39,761

Framework to create ChatGPT like bots over your dataset.

Opik 13,813

Confidently evaluate, test, and ship LLM applications with a suite of observability tools to calibrate language model outputs across your dev and production lifecycle.

IntelliServer 30

simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.

OpenLLM 11,766

Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at BentoML for LLMs-based applications.

DeepSpeed-Mii 2,053

MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.

Text-Embeddings-Inference 4,004

Inference for text-embeddings in Rust, HFOIL Licence.

Infinity 2,435

Inference for text-embeddings in Python

TensorRT-LLM 11,571

Nvidia Framework for LLM Inference

FasterTransformer 6,300

NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)

Flash-Attention 19,451

A method designed to enhance the efficiency of Transformer models

Langchain-Chatchat 36,071

Formerly langchain-ChatGLM, local knowledge based LLM (like ChatGLM) QA app with langchain.

Search with Lepton 8,133

Build your own conversational search engine using less than 500 lines of code by LeptonAI.

Robocorp 564

Create, deploy and operate Actions using Python anywhere to enhance your AI agents and assistants. Batteries included with an extensive set of libraries, helpers and logging.

LLocalSearch 5,950

Locally running websearch using LLM chains

AI Gateway 9,346

Gateway streamlines requests to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency.

talkd.ai dialog 418

Simple API for deploying any RAG or LLM that you want adding plugins.

Wllama 884

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference

GPUStack 3,692

An open-source GPU cluster manager for running LLMs

MNN-LLM 13,045

A Device-Inference framework, including LLM Inference on device(Mobile Phone/PC/IOT)

FlexLLMGen 9,364

FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.