
Inference Engines
FastChat
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at BentoML for LLMs-based applications.