Serge | LLMWay – The Way To LLM

Inference Engines

Serge

a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!

GitHub

a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!

Relevant Sites

Floom 44

AI gateway and marketplace for developers, enables streamlined integration of AI features into products

LLocalSearch 5,951

Locally running websearch using LLM chains

Langfuse 18,086

Open Source LLM Engineering Platform 🪢 Tracing, Evaluations, Prompt Management, Evaluations and Playground.

FlexLLMGen 9,376

FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

exllama 2,903

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Relevant Sites

Leave a Reply Cancel reply