
Inference Engines
FasterTransformer
NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.