LMDeploy | LLMWay – The Way To LLM

Inference Engines

LMDeploy

A high-throughput and low-latency inference and serving framework for LLMs and VLs

GitHub

A high-throughput and low-latency inference and serving framework for LLMs and VLs

Relevant Sites

SGLang 20,062

SGLang is a fast serving framework for large language models and vision language models.

FlexLLMGen 9,376

FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

Search with Lepton 8,130

Build your own conversational search engine using less than 500 lines of code by LeptonAI.

Haystack

an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.

Swiss Army Llama 1,031

Comprehensive set of tools for working with local LLMs for various tasks.

Relevant Sites

Leave a Reply Cancel reply