
Inference Engines
DeepSpeed-Mii
MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.
MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.
a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!