
Inference Engines
FastChat
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
An open-source GPU cluster manager for running LLMs