Nvidia Framework for LLM Inference
NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.
A high-throughput and memory-efficient inference and serving engine for LLMs.
A python package for Txt-to-SQL with self hosting functionalities and RESTful APIs compatible with proprietary as well as open source LLM.
Framework to create ChatGPT like bots over your dataset.
Data integration platform for LLMs.
Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 12 + 11 = ?*
Save my name, email, and website in this browser for the next time I comment.
NanoFlow is a throughput-oriented high-performance serving framework for LLMs. NanoFlow consistently delivers superior throughput compared to vLLM, Deepspeed-FastGen, and TensorRT-LLM.