LLM inference in C/C++.
SGLang is a fast serving framework for large language models and vision language models.
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
Inference for text-embeddings in Python
an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at BentoML for LLMs-based applications.
FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 19 + 18 = ?*
Save my name, email, and website in this browser for the next time I comment.
SGLang is a fast serving framework for large language models and vision language models.