vLLM | LLMWay – The Way To LLM

Inference Engines

vLLM

A high-throughput and memory-efficient inference and serving engine for LLMs.

GitHub

A high-throughput and memory-efficient inference and serving engine for LLMs.

Relevant Sites

magentic 2,378

Seamlessly integrate LLMs as Python functions

LiteChain 421

Lightweight alternative to LangChain for composing LLMs

Shell-Pilot 109

Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies.

Embedchain 42,828

Framework to create ChatGPT like bots over your dataset.

Nanoflow 912

NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.

Wllama 930

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference

Relevant Sites

Leave a Reply Cancel reply