LLM inference in C/C++.
Locally running websearch using LLM chains
Confidently evaluate, test, and ship LLM applications with a suite of observability tools to calibrate language model outputs across your dev and production lifecycle.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Building applications with LLMs through composability
Comprehensive set of tools for working with local LLMs for various tasks.
Gateway streamlines requests to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 13 + 18 = ?*
Save my name, email, and website in this browser for the next time I comment.
Locally running websearch using LLM chains