Open Source Python Artificial Intelligence Software - Page 5

Python Artificial Intelligence Software

View 13628 business solutions

Browse free open source Python Artificial Intelligence Software and projects below. Use the toggles on the left to filter open source Python Artificial Intelligence Software by OS, license, language, programming language, and project status.

  • Empower Your Contact Center with Human-Like AI Conversations Icon
    Empower Your Contact Center with Human-Like AI Conversations

    Deliver faster resolutions, lower costs, and better CX without hiring another agent.

    Enterprise Bot, based in Switzerland, is a pioneer in Conversational AI, Process Automation, and Generative AI. With the trust of esteemed enterprise giants across industries like Generali, SIX, SBB, DHL, and SWICA, Enterprise Bot is revolutionizing both customer and employee experiences. Through its advanced integration with Large Language Models (LLM) such as ChatGPT and Llama 2, and its unique patent-pending DocBrain technology, the company delivers unparalleled personalization, active engagement, and omnichannel solutions across platforms like email, voice, and chat. Furthermore, Enterprise Bot integrates with existing core systems, such as SAP, CRMs, Confluence and more, and with its proprietary middleware, Blitzico, enables the AI to not only respond to queries but also take action to resolve them. This dedication to innovation in four main use case areas, Customer Support, Sales and Marketing, Knowledge Management and Digital Coworker, elevates both CX and employee productivity.
    Learn More
  • Professional Streaming and Video Hosting - GDPR Compliant - 3Q Icon
    Professional Streaming and Video Hosting - GDPR Compliant - 3Q

    Secure hosting, scalable streaming, and easy integration for internal and external communications

    3Q offers a multifunctional video platform for hosting, managing and distributing video and audio content on all channels. Live and on-demand.
    Learn More
  • 1
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. Open-LLM-VTuber is modular, allowing developers to swap or configure different language models, speech recognition engines, and voice synthesis systems depending on their needs. It can run locally and supports both offline and online AI services, giving users flexibility in how models and resources are used. Open-LLM-VTuber was originally inspired by the goal of recreating an AI VTuber experience using open source tools that work across multiple operating systems.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 2
    OpenVoice

    OpenVoice

    Instant voice cloning by MIT and MyShell. Audio foundation model

    OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak naturally in others. Architecturally, OpenVoice separates “tone color” cloning from style control, which makes it easier to keep a consistent identity while flexibly changing prosody or language. The project provides open-weight models, inference code, and examples, making it suitable both for research and for building production voice experiences. It is actively developed by MyShell, which also integrates OpenVoice into broader agent and entertainment workflows.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 3
    ShellGPT

    ShellGPT

    A command-line productivity tool powered by AI large language models

    A command-line productivity tool powered by AI large language models (LLM). This command-line tool offers a streamlined generation of shell commands, code snippets, and documentation, eliminating the need for external resources (like Google search). Supports Linux, macOS, and Windows and is compatible with all major Shells like PowerShell, CMD, Bash, Zsh, etc. By default, ShellGPT uses OpenAI's API and GPT-4 model. You'll need an API key, you can generate one here. You will be prompted for your key which will then be stored in ~/.config/shell_gpt/.sgptrc. OpenAI API is not free of charge, please refer to the OpenAI pricing for more information.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 4
    Tabnine

    Tabnine

    Vim client for TabNine

    Tabnine is an AI-powered code completion extension trusted by millions of developers around the world. Whether you’re just getting started as a developer or if you’ve been doing it for decades, Tabnine will help you code twice as fast with half the keystrokes – all in your favorite IDE. Whether you call it IntelliSense, intelliCode, autocomplete, AI-assisted code completion, AI-powered code completion, AI copilot, AI code snippets, code suggestion, code prediction, code hinting, or content assist, you probably already know that it can save you tons of time, easily cutting your keystrokes in half. Powered by sophisticated machine learning models trained on billions of lines of trusted open source code from GitHub, Tabnine is the most advanced AI-powered code completion copilot available today. And like GitHub, it is an essential tool for professional developers.
    Downloads: 25 This Week
    Last Update:
    See Project
  • Self-hosted n8n: No-code AI workflows Icon
    Self-hosted n8n: No-code AI workflows

    Connect workflows. Integrate data

    A free-to-use workflow automation tool, n8n lets you connect all your apps and data in one customizable, no-code platform. Design workflows and process data from a simple, unified dashboard.
    Learn More
  • 5
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while maintaining or improving feature quality. The model supports multiple backbone architectures, including Vision Transformers (ViT), and can handle larger image resolutions with improved stability during training. The learned embeddings generalize robustly across tasks like classification, retrieval, and segmentation without fine-tuning, showing state-of-the-art transfer performance among self-supervised models.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 6
    VGGFace2

    VGGFace2

    VGGFace2 Dataset for Face Recognition

    VGGFace2 is a large-scale face recognition dataset developed to support research on facial recognition across variations in pose, age, illumination, and identity. It consists of 3.31 million images covering 9,131 subjects, with an average of over 360 images per subject. The dataset was collected from Google Image Search, ensuring a wide diversity in ethnicity, profession, and real-world conditions. It is split into a training set with 8,631 identities and a test set with 500 identities, making it suitable for benchmarking and large-scale model training. Alongside the dataset, the repository provides pre-trained models based on ResNet-50 and SE-ResNet-50 architectures, trained with both MS-Celeb-1M pretraining and fine-tuning on VGGFace2. These models achieve strong verification performance on benchmarks such as IJB-B and include variants with lower-dimensional embeddings for compact feature representation. The project also includes preprocessing tools, face detection scripts, and etc.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 7
    python-telegram-bot

    python-telegram-bot

    A Python wrapper you can't refuse

    python-telegram-bot is a library that provides a pure Python interface for the Telegram Bot API. It supports all types and methods of the API 4.8, and is compatible with all Python versions 3.5+ as well as PyPy. Apart from the pure API implementation, python-telegram-bot also offers several high-level classes contained in the telegram.ext submodule. These make bot development much easier and straightforward. python-telegram-bot is free and open source, fun to use, and fast and easy to install. Visit https://github.com/python-telegram-bot/python-telegram-bot/blob/master/examples/README.md to see official examples or the project’s wiki on https://github.com/python-telegram-bot/python-telegram-bot/wiki/Examples to see other community-built bots.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 8
    Alpa

    Alpa

    Training and serving large-scale neural networks

    Alpa is a system for training and serving large-scale neural networks. Scaling neural networks to hundreds of billions of parameters has enabled dramatic breakthroughs such as GPT-3, but training and serving these large-scale neural networks require complicated distributed system techniques. Alpa aims to automate large-scale distributed training and serving with just a few lines of code.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 9
    CoPaw

    CoPaw

    Your Personal AI Assistant; easy to install, deploy on local or coud

    CoPaw is a personal AI assistant designed to run on your own machine or in the cloud, giving you full control over memory, models, and data. Built by the AgentScope team, it connects to multiple chat platforms—including DingTalk, Feishu, QQ, Discord, iMessage, and more—through a single unified assistant. CoPaw supports both cloud-based LLM providers and fully local models such as llama.cpp, MLX, and Ollama, allowing you to operate without API keys if preferred. It includes a browser-based Console for chatting, configuring models, managing memory, and extending capabilities with custom skills. With built-in cron scheduling, heartbeat check-ins, and extensible skill loading, CoPaw grows with your workflow over time. Easy installation options—including pip, one-line scripts, Docker, and cloud deployment—make it accessible for both developers and non-technical users.
    Downloads: 23 This Week
    Last Update:
    See Project
  • Zendesk: The Complete Customer Service Solution Icon
    Zendesk: The Complete Customer Service Solution

    Discover AI-powered, award-winning customer service software trusted by 200k customers

    Equip your agents with powerful AI tools and workflows that boost efficiency and elevate customer experiences across every channel.
    Learn More
  • 10
    Fish Speech

    Fish Speech

    SOTA Open Source TTS

    Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face. The models are evaluated with Seed TTS metrics and achieve exceptionally low word and character error rates, indicating strong intelligibility and alignment between text and audio. Fish Speech emphasizes expressive and controllable voices: it supports a long list of emotion tags, tone markers, and special audio effect markers that can be embedded in the text to drive prosody and vocal style, from basic emotions to nuanced states like sarcastic, conciliative, or hysterical. The system is multilingual and cross-lingual, handling multiple languages in a single input without explicit phoneme markup, and is trained on large-scale datasets.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 11
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. The model’s multimodal capabilities allow it to reason across image and text content holistically, capturing structured and unstructured information from pages that include dense tables, seals, code snippets, and varied document graphics. GLM-OCR integrates a comprehensive SDK and inference toolchain that makes it easy for developers to install, invoke, and embed into production pipelines with simple commands or APIs.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 12
    Hunyuan3D-2.1

    Hunyuan3D-2.1

    From Images to High-Fidelity 3D Assets

    Hunyuan3D-2.1 is Tencent Hunyuan’s advanced 3D asset generation system that produces high-fidelity 3D models with Physically Based Rendering (PBR) textures. It is fully open-source with released model weights, training, and inference code. It improves on prior versions by using a PBR texture pipeline (enabling realistic material effects like reflections and subsurface scattering) and allowing community fine-tuning and extension. It supports both shape generation (mesh geometry) and texture generation modules. Physically Based Rendering texture synthesis to model realistic material effects, including reflections, subsurface scattering, etc. Cross-platform support (MacOS, Windows, Linux) via Python / PyTorch, including diffusers-style APIs.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 13
    Loki Mode

    Loki Mode

    Multi-agent autonomous startup system for Claude Code

    Loki Mode is a multi-agent autonomous execution system designed to take structured product requirements or specifications and autonomously drive the creation, testing, deployment, and scaling of complex software projects using a large team of specialized AI agents. It orchestrates dozens of agent types across swarms that handle designated roles — such as architecture, coding, QA, deployment, and business workflows — running in parallel to cover both engineering and operational tasks without continuous human intervention. By supporting multiple AI providers (like Claude Code, OpenAI Codex CLI, and Google Gemini CLI), loki-mode dynamically selects and spawns only the needed agents for a given project, optimizing computational resources and task throughput. Its Reason-Act-Reflect-Verify (RARV) cycle with self-verification loops emphasizes quality and resilience, automating end-to-end development lifecycles.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 14
    LightZero

    LightZero

    [NeurIPS 2023 Spotlight] LightZero

    LightZero is an efficient, scalable, and open-source framework implementing MuZero, a powerful model-based reinforcement learning algorithm that learns to predict rewards and transitions without explicit environment models. Developed by OpenDILab, LightZero focuses on providing a highly optimized and user-friendly platform for both academic research and industrial applications of MuZero and similar algorithms.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 15
    WhisperJAV

    WhisperJAV

    Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

    WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces a specialized pipeline that separates text generation from timestamp alignment, allowing the system to generate transcripts and then align them with audio using forced alignment techniques. The framework supports several speech recognition models, including Qwen-based ASR systems and fine-tuned Whisper models trained on domain-specific dialogue.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 16
    BrowserGym

    BrowserGym

    A Gym environment for web task automation

    BrowserGym is an open framework for web task automation research that exposes browser interaction as a Gym-style environment for training and evaluating agents. It is intended for researchers building web agents rather than for end users looking for a consumer automation product. The project provides a common environment where agents can interact with websites, execute tasks, and be evaluated against standardized benchmarks. One of its main strengths is that it bundles several important benchmarks by default, including MiniWoB, WebArena, VisualWebArena, WorkArena, AssistantBench, WebLINX, and OpenApps. This gives researchers a unified way to compare agent behavior across diverse web environments and task types without stitching together separate evaluation stacks. BrowserGym is also designed to be extensible, and the repository notes that creating new benchmarks mainly involves inheriting its abstract task interface.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 17
    Chatterbox

    Chatterbox

    SoTA open-source TTS

    Chatterbox is Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs and is consistently preferred in side-by-side evaluations. Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out. Try it now on our Hugging Face Gradio app. If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub-200ms—ideal for production use in agents, applications, or interactive media.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 18
    OpenClaw Medical Skills

    OpenClaw Medical Skills

    The largest open-source medical AI skills library for OpenClaw

    OpenClaw-Medical-Skills is an open-source library that provides a large collection of specialized medical capabilities designed for the OpenClaw AI agent ecosystem. The project organizes domain-specific “skills” that enable autonomous agents to perform tasks related to biomedical research, healthcare analysis, and clinical data interpretation. Each skill is packaged as a modular component that can be integrated into an OpenClaw-based AI assistant, allowing the agent to perform expert-level reasoning and workflows in medical contexts. Instead of relying on general-purpose language model responses, the repository equips AI agents with structured instructions and tools tailored to medical knowledge and datasets. This modular design allows developers and researchers to build AI systems that can access specialized medical reasoning processes, retrieve relevant biomedical information, and generate structured outputs suitable for analysis or downstream processing.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 19
    POT

    POT

    Python Optimal Transport

    This open source Python library provides several solvers for optimization problems related to Optimal Transport for signal, image processing and machine learning.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 20
    Rogue

    Rogue

    AI Agent Evaluator & Red Team Platform

    Rogue is an open-source evaluation and red-team framework designed to test the reliability, safety, and policy compliance of AI agents. The platform automatically interacts with an AI agent by generating dynamic scenarios and multi-turn conversations that simulate real-world interactions. Instead of relying solely on static test scripts, Rogue uses an agent-as-a-judge architecture where one agent probes another agent to detect failures or unexpected behaviors. The system allows developers to define specific scenarios, expected outcomes, and business rules so that the framework can verify whether an agent behaves according to required policies. During testing, Rogue records conversations and produces detailed reports that explain whether the agent passed or failed each scenario. These reports include reasoning and evidence, helping developers understand why a particular failure occurred.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 21
    Strix

    Strix

    Open-source AI hackers to find and fix your app’s vulnerabilities

    Strix is an open source agent-driven security platform that uses autonomous AI agents to identify, investigate, and validate vulnerabilities in software applications. The system is designed to mimic the behavior of real attackers by executing dynamic testing and verifying findings through proof-of-concept exploitation. Unlike traditional vulnerability scanners that rely heavily on static analysis, Strix agents actively run code, probe systems, and attempt exploitation to confirm whether vulnerabilities are genuinely exploitable. The platform is intended for developers and security teams that need rapid security assessments without the overhead of manual penetration testing engagements. Strix can orchestrate multiple cooperating agents that divide investigation tasks and collaboratively analyze complex applications or infrastructure.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 22
    Ultroid

    Ultroid

    Telegram UserBot, Built in Python Using Telethon lib

    Ultroid, a pluggable telegram userbot, made in python using Telethon! Ultroid has been written from scratch, making it more stable and less crashes. Ultroid warns you when you try to install/execute dangerous stuff (people nowadays make plugins to hack user accounts, Ultroid is safe). Unlike many others userbots that are being suspended by Heroku, Ultroid doesn't get suspended. Ultroid has been written from scratch, making it more stable and less of crashes. Error handling been done in the best way possible, such that the bot doesn't crash and stop all of a sudden. Ultroid has minimal amount of plugins (just the necessary ones) in the main repository, and all the other less-useful stuff in the addons repository. This facilitates quick deployments and lag-free use. Ultroid can install any plugin from the most of the other 'userbots' without any issue.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 23
    VibeVoice

    VibeVoice

    Open-source multi-speaker long-form text-to-speech model

    VibeVoice-1.5B is Microsoft’s frontier open-source text-to-speech (TTS) model designed for generating expressive, long-form, multi-speaker conversational audio such as podcasts. Unlike traditional TTS systems, it excels in scalability, speaker consistency, and natural turn-taking for up to 90 minutes of continuous speech with as many as four distinct speakers. A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. The model integrates a Qwen2.5-based large language model with a diffusion head to produce realistic acoustic details and capture conversational context. Training involved curriculum learning with increasing sequence lengths up to 65K tokens, allowing VibeVoice to handle very long dialogues effectively. Safety mechanisms include an audible disclaimer and imperceptible watermarking in all generated audio to mitigate misuse risks.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 24
    labelme Image Polygonal Annotation

    labelme Image Polygonal Annotation

    Image polygonal annotation with Python

    Labelme is a graphical image annotation tool. It is written in Python and uses Qt for its graphical interface. Image annotation for polygon, rectangle, circle, line and point. Image flag annotation for classification and cleaning. Video annotation. (video annotation). GUI customization (predefined labels / flags, auto-saving, label validation, etc). Exporting VOC-format dataset for semantic/instance segmentation. (semantic segmentation, instance segmentation). Exporting COCO-format dataset for instance segmentation. (instance segmentation). The first time you run labelme, it will create a config file in ~/.labelmerc. You can edit this file and the changes will be applied the next time that you launch labelme. If you would prefer to use a config file from another location, you can specify this file with the --config flag.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 25
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    pyttsx3 is an offline text-to-speech library for Python that wraps native speech engines instead of calling cloud APIs. It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.
    Downloads: 21 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB