The Triton Inference Server provides an optimized cloud
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Easiest and laziest way for building multi-agent LLMs applications
Library for serving Transformers models on Amazon SageMaker
A Pythonic framework to simplify AI service building
GPU environment management and cluster orchestration
Ready-to-use OCR with 80+ supported languages
LLM training code for MosaicML foundation models
Bring the notion of Model-as-a-Service to life
A unified framework for scalable computing
Large Language Model Text Generation Inference
Replace OpenAI GPT with another LLM in your app
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Standardized Serverless ML Inference Platform on Kubernetes
A library for accelerating Transformer models on NVIDIA GPUs
Library for OCR-related tasks powered by Deep Learning
Low-latency REST API for serving text-embeddings
PyTorch library of curated Transformer models and their components
State-of-the-art diffusion models for image and audio generation
Openai style api for open large language models
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Powering Amazon custom machine learning chips
Open-source tool designed to enhance the efficiency of workloads
A high-performance ML model serving framework, offers dynamic batching
Unified Model Serving Framework