Nanoflow | LLMWay – The Way To LLM

Inference Engines

Nanoflow

NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.

GitHub

NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.

Relevant Sites

IntelliServer 29

simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.

FlexLLMGen 9,376

FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

Infinity 2,535

Inference for text-embeddings in Python

Serge 5,754

a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!

Shell-Pilot 109

Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies.

OpenLLM 11,924

Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at BentoML for LLMs-based applications.

Relevant Sites

Leave a Reply Cancel reply