Inference Engines
LMDeploy
A high-throughput and low-latency inference and serving framework for LLMs and VLs
A high-throughput and low-latency inference and serving framework for LLMs and VLs
a toolkit for deploying and serving Large Language Models (LLMs).