OCR expert VLM powered by Hunyuan's native multimodal architecture
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Qwen-Image is a powerful image generation foundation model
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Chat & pretrained large vision language model
Capable of understanding text, audio, vision, video
A SOTA open-source image editing model
Diversity-driven optimization and large-model reasoning ability
Tool for exploring and debugging transformer model behaviors
Qwen3-TTS is an open-source series of TTS models
Open-source framework for intelligent speech interaction
OpenTinker is an RL-as-a-Service infrastructure for foundation models
GLM-4 series: Open Multilingual Multimodal Chat LMs
Open-weight, large-scale hybrid-attention reasoning model
Qwen3-omni is a natively end-to-end, omni-modal LLM
Chat & pretrained large audio language model proposed by Alibaba Cloud
An AI-powered security review GitHub Action using Claude
Renderer for the harmony response format to be used with gpt-oss
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
State-of-the-art (SoTA) text-to-video pre-trained model
Block Diffusion for Ultra-Fast Speculative Decoding
FAIR Sequence Modeling Toolkit 2
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Long-form streaming TTS system for multi-speaker dialogue generation
Controllable & emotion-expressive zero-shot TTS