WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Locally running websearch using LLM chains
an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
Inference for text-embeddings in Python
Build your own conversational search engine using less than 500 lines of code by LeptonAI.
Playground for devs to finetune & deploy LLMs
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 12 - 20 = ?*
Save my name, email, and website in this browser for the next time I comment.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.