
Inference Engines
Langfuse
Open Source LLM Engineering Platform 🪢 Tracing, Evaluations, Prompt Management, Evaluations and Playground.
Open Source LLM Engineering Platform 🪢 Tracing, Evaluations, Prompt Management, Evaluations and Playground.
FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.