Building applications with LLMs through composability
FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.
Build your own conversational search engine using less than 500 lines of code by LeptonAI.
Blazingly fast LLM inference.
Framework to create ChatGPT like bots over your dataset.
AI gateway and marketplace for developers, enables streamlined integration of AI features into products
a toolkit for deploying and serving Large Language Models (LLMs).
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 14 + 19 = ?*
Save my name, email, and website in this browser for the next time I comment.
FlexLLMGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexLLMGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.