LLM inference in C/C++.
Easily build, version, evaluate and deploy your LLM-powered apps.
Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
Confidently evaluate, test, and ship LLM applications with a suite of observability tools to calibrate language model outputs across your dev and production lifecycle.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 11 - 10 = ?*
Save my name, email, and website in this browser for the next time I comment.
Easily build, version, evaluate and deploy your LLM-powered apps.