Building applications with LLMs through composability
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at BentoML for LLMs-based applications.
Simple API for deploying any RAG or LLM that you want adding plugins.
Use ChatGPT On Wechat via wechaty
A method designed to enhance the efficiency of Transformer models
Lightweight alternative to LangChain for composing LLMs
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 12 - 15 = ?*
Save my name, email, and website in this browser for the next time I comment.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.