A Pragmatic VLA Foundation Model
LTX-Video Support for ComfyUI
Pushing the Limits of Mathematical Reasoning in Open Language Models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
GLM-4 series: Open Multilingual Multimodal Chat LMs
ICLR2024 Spotlight: curation/training code, metadata, distribution
Memory-efficient and performant finetuning of Mistral's models
PyTorch code and models for the DINOv2 self-supervised learning
Unified Multimodal Understanding and Generation Models
An AI-powered security review GitHub Action using Claude
Pokee Deep Research Model Open Source Repo
Tooling for the Common Objects In 3D dataset
Renderer for the harmony response format to be used with gpt-oss
CLIP, Predict the most relevant text snippet given an image
Qwen2.5-VL is the multimodal large language model series
HY-Motion model for 3D character animation generation
Large Multimodal Models for Video Understanding and Editing
Open Source Speech Language Model
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Pretrained time-series foundation model developed by Google Research
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
OCR expert VLM powered by Hunyuan's native multimodal architecture
State-of-the-art Image & Video CLIP, Multimodal Large Language Models