Structured outputs for llms
Accelerate local LLM inference and finetuning
A high-throughput and memory-efficient inference and serving engine
Uncertainty Quantification for Language Models, is a Python package
PandasAI is a Python library that integrates generative AI
PyTorch library of curated Transformer models and their components
Gemma open-weight LLM library, from Google DeepMind
A python module to repair invalid JSON from LLMs
Scalable data pre processing and curation toolkit for LLMs
Access large language models from the command-line
Synthetic data curation for post-training and data extraction
Open source libraries and APIs to build custom preprocessing pipelines
Accessible large language models via k-bit quantization for PyTorch
LLM abstractions that aren't obstructions
AirLLM 70B inference with single 4GB GPU
Easy token price estimates for 400+ LLMs. TokenOps
The Security Toolkit for LLM Interactions
Replace OpenAI GPT with another LLM in your app
Tools for merging pretrained large language models
⚡ Building applications with LLMs through composability ⚡
Schema-Guided Reasoning (SGR) has agentic system design
DepGraph: Towards Any Structural Pruning
Advanced techniques for RAG systems
NeurIPS2025 Spotlight] Quantized Attention
A New Axis of Sparsity for Large Language Models