Shell-Pilot | LLMWay – The Way To LLM

Inference Engines

Shell-Pilot

Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies.

GitHub

Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies.

Relevant Sites

Flash-Attention 20,422

A method designed to enhance the efficiency of Transformer models

MNN-LLM 13,446

A Device-Inference framework, including LLM Inference on device(Mobile Phone/PC/IOT)

FlexLLMGen 9,376

FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

Wllama 930

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference

LMDeploy 7,232

A high-throughput and low-latency inference and serving framework for LLMs and VLs

Nanoflow 912

NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.

Relevant Sites

Leave a Reply Cancel reply