Inference Engines
FastChat
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
Blazingly fast LLM inference.