FlexLLMGen | LLMWay – The Way To LLM

Inference Engines

FlexLLMGen

GitHub Paper

FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

Relevant Sites

IntelliServer 29

simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.

Floom 44

AI gateway and marketplace for developers, enables streamlined integration of AI features into products

ollama 155,602

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Opik 15,537

Confidently evaluate, test, and ship LLM applications with a suite of observability tools to calibrate language model outputs across your dev and production lifecycle.

LiteChain 421

Lightweight alternative to LangChain for composing LLMs

Relevant Sites

Leave a Reply Cancel reply