exllama | LLMWay – The Way To LLM

Inference Engines

exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

GitHub

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Relevant Sites

MindSQL 425

A python package for Txt-to-SQL with self hosting functionalities and RESTful APIs compatible with proprietary as well as open source LLM.

wechat-chatgpt 13,293

Use ChatGPT On Wechat via wechaty

SGLang 20,062

SGLang is a fast serving framework for large language models and vision language models.

Opik 15,537

Confidently evaluate, test, and ship LLM applications with a suite of observability tools to calibrate language model outputs across your dev and production lifecycle.

TensorRT-LLM 12,074

Nvidia Framework for LLM Inference

MNN-LLM 13,446

A Device-Inference framework, including LLM Inference on device(Mobile Phone/PC/IOT)

Relevant Sites

Leave a Reply Cancel reply