
Inference Engines
IntelliServer
simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.
simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.
FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.