Nvidia Framework for LLM Inference
A method designed to enhance the efficiency of Transformer models
Easily build, version, evaluate and deploy your LLM-powered apps.
An interactive chat project that leverages Ollama/OpenAI/MistralAI LLMs for rapid understanding and navigation of GitHub code repository or compressed file resources.
An open-source GPU cluster manager for running LLMs
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Gateway streamlines requests to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 12 + 15 = ?*
Save my name, email, and website in this browser for the next time I comment.
A method designed to enhance the efficiency of Transformer models