LLMWay – The Way To LLM LLMWay – The Way To LLM
  • LLM Trends
    • Leaderboard
  • LLM Models
  • LLM Learning
    • Milestone Papers
  • LLM Inference
    • Inference Engines
  • LLM Training
    • Training Frameworks
    • Evaluation
  • Home
  • Blog
  • Submit Sites
llama.cpp
Inference Engines
llama.cpp

LLM inference in C/C++.

GitHub
LLM inference in C/C++.

Relevant Sites

SGLang 20,062

SGLang is a fast serving framework for large language models and vision language models.

ollama 155,602

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Infinity 2,535

Inference for text-embeddings in Python

Haystack

an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.

OpenLLM 11,924

Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at BentoML for LLMs-based applications.

FlexLLMGen 9,376

FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Copyright © 2025 LLMWay – The Way To LLM