Showing 1058 open source projects for "python data analysis"

View related business solutions
  • 1
    Lightly

    Lightly

    A python library for self-supervised learning on images

    A python library for self-supervised learning on images. We, at Lightly, are passionate engineers who want to make deep learning more efficient. That's why - together with our community - we want to popularize the use of self-supervised methods to understand and curate raw image data. Our solution can be applied before any data annotation step and the learned representations can be used to visualize and analyze datasets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Get Physics Done (GPD)

    Get Physics Done (GPD)

    The first open-source agentic AI physicist

    Get Physics Done (GPD) is an open-source project designed to accelerate scientific research in physics by leveraging modern computational tools and automation techniques. It aims to simplify the process of performing simulations, calculations, and experimental analysis by providing structured workflows that integrate computational physics methods with reproducible research practices. The project focuses on reducing the friction involved in setting up experiments, running simulations, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    SurfSense

    SurfSense

    Connect any LLM to your internal knowledge sources

    SurfSense is an open-source AI research and knowledge assistant platform that connects any large language model to internal knowledge sources so teams and individuals can explore, query, and collaborate on insights in real time. Built as an alternative to proprietary tools like NotebookLM, Perplexity, and Glean, SurfSense allows integrations with a wide range of external data sources including Slack, Notion, Google Drive, GitHub, YouTube, and many enterprise systems, making it possible to...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    UMAP

    UMAP

    Uniform Manifold Approximation and Projection

    Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction. It is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low-dimensional projection of the data that has the closest possible equivalent fuzzy topological structure. First of all UMAP is fast. It can handle large datasets and high dimensional...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 5
    Alibi Detect

    Alibi Detect

    Algorithms for outlier, adversarial and drift detection

    Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. Both TensorFlow and PyTorch backends are supported for drift detection.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    BioNeMo

    BioNeMo

    BioNeMo Framework: For building and adapting AI models

    BioNeMo is an AI-powered framework developed by NVIDIA for protein and molecular generation using deep learning models. It provides researchers and developers with tools to design, analyze, and optimize biological molecules, aiding in drug discovery and synthetic biology applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    PySINDy

    PySINDy

    A package for the sparse identification of nonlinear dynamical systems

    PySINDy is a Python library that implements the Sparse Identification of Nonlinear Dynamics (SINDy) method for discovering mathematical models of dynamical systems from data. The framework focuses on identifying governing equations that describe the behavior of complex physical systems by selecting sparse combinations of candidate functions. Instead of fitting a purely predictive machine learning model, PySINDy attempts to recover interpretable differential equations that explain how a system evolves over time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Paperless-AI

    Paperless-AI

    AI-powered document analysis and tagging for Paperless-ngx

    Paperless-AI is an AI-powered extension designed to enhance document management within Paperless-ngx by automating analysis, classification, and organization tasks. It continuously monitors incoming documents and processes them using various AI backends, enabling automatic assignment of titles, tags, document types, and correspondents. It integrates with multiple OpenAI-compatible services as well as local models, giving users flexibility in how document intelligence is handled. A key...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 10
    MCP Atlassian

    MCP Atlassian

    MCP server that integrates Confluence and Jira

    The MCP Atlassian server integrates Atlassian products like Confluence and Jira with the Model Context Protocol. It supports both Cloud and Server/Data Center deployments, enabling AI models to interact with these platforms securely. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    LightAutoML

    LightAutoML

    Fast and customizable framework for automatic ML model creation

    LightAutoML is an automated machine learning (AutoML) framework optimized for efficient model training and hyperparameter tuning, focusing on both tabular and text data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    The AI Scientist-v2

    The AI Scientist-v2

    Workshop-Level Automated Scientific Discovery via Agentic Tree Search

    AI-Scientist-v2 is an advanced autonomous research system designed to perform end-to-end scientific discovery using large language models and agent-based orchestration. The platform is capable of generating original research ideas, designing and executing experiments, analyzing and visualizing results, and producing full academic papers without direct human intervention. It introduces a generalized framework that removes reliance on predefined templates, enabling broader applicability across...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MineContext

    MineContext

    MineContext is your proactive context-aware AI partner

    MineContext is an open-source, proactive AI assistant designed to capture, understand, and leverage a user’s digital context in order to provide meaningful insights, summaries, and productivity support. The system continuously collects contextual data from sources such as screenshots and user activity, then processes and organizes this information into structured knowledge that can be reused later. Unlike traditional chat-based assistants, MineContext operates in the background and delivers...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    VoxelMorph

    VoxelMorph

    Unsupervised Learning for Image Registration

    VoxelMorph is an open-source deep learning framework designed for medical image registration, a process that aligns multiple medical scans into a common spatial coordinate system. Traditional image registration techniques typically rely on optimization procedures that must be executed separately for each pair of images, which can be computationally expensive and slow. VoxelMorph approaches the problem using neural networks that learn to predict deformation fields that transform one image so...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    LLM Vision

    LLM Vision

    Visual intelligence for your home.

    LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Transformer Debugger

    Transformer Debugger

    Tool for exploring and debugging transformer model behaviors

    Transformer Debugger (TDB) is a research tool developed by OpenAI’s Superalignment team to investigate and interpret the behaviors of small language models. It combines automated interpretability methods with sparse autoencoders, enabling researchers to analyze how specific neurons, attention heads, and latent features contribute to a model’s outputs. TDB allows users to intervene directly in the forward pass of a model and observe how such interventions change predictions, making it...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MCP Server Qdrant

    MCP Server Qdrant

    An official Qdrant Model Context Protocol (MCP) server implementation

    The Qdrant MCP Server is an official Model Context Protocol server that integrates with the Qdrant vector search engine. It acts as a semantic memory layer, allowing for the storage and retrieval of vector-based data, enhancing the capabilities of AI applications requiring semantic search functionalities. ​
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    cognee

    cognee

    Deterministic LLMs Outputs for AI Applications and AI Agents

    Cognee implements scalable, modular data pipelines that allow for creating the LLM-enriched data layer using graph and vector stores. Cognee acts a semantic memory layer, unveiling hidden connections within your data and infusing it with your company's language and principles. This self-optimizing process ensures ultra-relevant, personalized, and contextually aware LLM retrievals. Any kind of data works; unstructured text or raw media files, PDFs, tables, presentations, JSON files, and so...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    DeiT (Data-efficient Image Transformers)
    DeiT (Data-efficient Image Transformers) shows that Vision Transformers can be trained competitively on ImageNet-1k without external data by using strong training recipes and knowledge distillation. Its key idea is a specialized distillation strategy—including a learnable “distillation token”—that lets a transformer learn effectively from a CNN or transformer teacher on modest-scale datasets. The project provides compact ViT variants (Tiny/Small/Base) that achieve excellent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    MLRun

    MLRun

    Machine Learning automation and tracking

    MLRun is an open MLOps framework for quickly building and managing continuous ML and generative AI applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications, significantly reducing engineering efforts, time to production, and computation resources. MLRun breaks the silos between data, ML, software, and DevOps/MLOps teams, enabling collaboration and fast continuous...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    ...The first software requirement is Python 2.6, 2.7, or Python 3.3+. This is required to use the library. PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 22
    NVIDIA PhysicsNeMo

    NVIDIA PhysicsNeMo

    Open-source deep-learning framework for building and training

    ...The framework focuses on the emerging field of physics-informed machine learning, where neural networks are used alongside physical equations to model complex scientific systems. PhysicsNeMo provides modular Python components that allow developers to create scalable training and inference pipelines for models that combine data-driven learning with physics-based constraints. It is built on top of the PyTorch ecosystem and integrates with GPU-accelerated computing environments to handle computationally demanding simulations and datasets. The framework supports a wide range of scientific applications, including computational fluid dynamics, climate modeling, weather prediction, and engineering simulations.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Core ML Tools

    Core ML Tools

    Core ML tools contain supporting tools for Core ML model conversion

    Use Core ML Tools (coremltools) to convert machine learning models from third-party libraries to the Core ML format. This Python package contains the supporting tools for converting models from training libraries. Core ML is an Apple framework to integrate machine learning models into your app. Core ML provides a unified representation for all models. Your app uses Core ML APIs and user data to make predictions, and to fine-tune models, all on the user’s device.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Video-subtitle-remover (VSR)

    Video-subtitle-remover (VSR)

    AI tool that removes hardcoded subtitles and text from videos locally

    Video Subtitle Remover is an AI-based application designed to remove hardcoded subtitles from videos and generate new files without the embedded text. Video Subtitle Remover analyzes video frames and detects subtitle regions, then replaces the removed areas using an AI algorithm that fills the space with reconstructed visual content. This process aims to maintain the original resolution and visual continuity of the video after subtitle removal. It allows users to define a specific subtitle...
    Downloads: 121 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB