Showing 25 open source projects for "python data analysis"

View related business solutions
  • Resco toolkit for building mobile apps Icon
    Resco toolkit for building mobile apps

    A no-code toolkit for building responsive and resilient mobile business applications for Microsoft Power Platform, Dynamics 365, Dataverse and Salesfo

    Deploying mobile apps with Resco takes days, not months—all without writing a single line of code. Workers can download the Resco app from AppStore, Google Play, or Windows Store, log into your company environment, and instantly use the app you have published on any device.
    Learn More
  • The top-rated AI recruiting platform for faster, smarter hiring. Icon
    The top-rated AI recruiting platform for faster, smarter hiring.

    Humanly is an AI recruiting platform that automates candidate conversations, screening, and scheduling.

    Humanly is an AI-first recruiting platform that helps talent teams hire in days, not months—without adding headcount. Our intuitive CRM pairs with powerful agentic AI to engage and screen every candidate instantly, surfacing top talent fast. Built on insights from over 4 million candidate interactions, Humanly delivers speed, structure, and consistency at scale—engaging 100% of interested candidates and driving pipeline growth through targeted outreach and smart re-engagement. We integrate seamlessly with all major ATSs to reduce manual work, improve data flow, and enhance recruiter efficiency and candidate experience. Independent audits ensure our AI remains fair and bias-free, so you can hire confidently.
    Learn More
  • 1
    Cookiecutter Data Science

    Cookiecutter Data Science

    Project structure for doing and sharing data science work

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    AI Data Science Team

    AI Data Science Team

    An AI-powered data science team of agents

    AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Positron

    Positron

    Positron, a next-generation data science IDE

    Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    marimo

    marimo

    A reactive notebook for Python

    marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Non Emergency Medical Transportation (NEMT) Software Icon
    Non Emergency Medical Transportation (NEMT) Software

    Healthcare providers in search of a scheduling and dispatch solution for non emergency medical transportation

    NovusMED is an ecosystem that includes call center, administrative, driver applications, and client/clinic booking applications. NovusMED is the platform of choice for a wide range of medical transportation services and includes configurations for brokerage, providers, senior, community, and home health programs. Accurately manage calls and patient information. Monitor real-time performance and adjust resource capacity to meet changes in service demand. Manage will calls, confirmation calls, and recurring trips/standing orders in real time. Improved mileage reimbursement and cost calculators to manage multiple contractors, funding sources (payors), multiple providers, and volunteer driver programs. Enhanced credential management for vehicles and drivers. Manage subcontractor outsourcing with provider mobile, trip bidding, and trip offers. Able to see the closest vehicle and perform immediate bookings.
    Learn More
  • 5
    Dask

    Dask

    Parallel computing with task scheduling

    Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    NannyML

    NannyML

    Detecting silent model failure. NannyML estimates performance

    NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Great Expectations

    Great Expectations

    Always know what to expect from your data

    Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data. They are the workhorse abstraction in Great Expectations, covering all kinds of common data issues. Expectations...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    PySyft

    PySyft

    Data science on data without acquiring a copy

    Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    AWS SDK for pandas

    AWS SDK for pandas

    Easy integration with Athena, Glue, Redshift, Timestream, Neptune

    aws-sdk-pandas (formerly AWS Data Wrangler) bridges pandas with the AWS analytics stack so DataFrames flow seamlessly to and from cloud services. With a few lines of code, you can read from and write to Amazon S3 in Parquet/CSV/JSON/ORC, register tables in the AWS Glue Data Catalog, and query with Amazon Athena directly into pandas. The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Cloud-Based Software Licensing - Zentitle by Nalpeiron Icon
    Cloud-Based Software Licensing - Zentitle by Nalpeiron

    The #1 Software Licensing Solution. Release new Software License Models fast with no engineering. Increase software sales and drive up revenues.

    1000’s software companies have used Zentitle to launch new software products fast and control their entitlements easily - many going from startup to IPO on our platform. Our software monetization infrastructure allows you to easily build or
    Learn More
  • 10
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ClearML

    ClearML

    Streamline your ML workflow

    ClearML is an open source platform that automates and simplifies developing and managing machine learning solutions for thousands of data science teams all over the world. It is designed as an end-to-end MLOps suite allowing you to focus on developing your ML code & automation, while ClearML ensures your work is reproducible and scalable. The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    SageMaker Training Toolkit

    SageMaker Training Toolkit

    Train machine learning models within Docker containers

    Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    NVIDIA Merlin

    NVIDIA Merlin

    Library providing end-to-end GPU-accelerated recommender systems

    NVIDIA Merlin is an open-source library that accelerates recommender systems on NVIDIA GPUs. The library enables data scientists, machine learning engineers, and researchers to build high-performing recommenders at scale. Merlin includes tools to address common feature engineering, training, and inference challenges. Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Recommenders

    Recommenders

    Best practices on recommendation systems

    The Recommenders repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The module reco_utils contains functions to simplify common tasks used when developing and evaluating recommender systems. Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MCPower

    MCPower

    MCPower — simple Monte Carlo power analysis for complex models

    MCPower-GUI is a desktop application that provides a graphical interface for the MCPower Monte Carlo power analysis library. It guides users through the full workflow across three tabs: Model setup (formula input with live parsing, CSV data upload with auto-detected variable types, effect size sliders, and correlation editing), Analysis configuration (find power for a given sample size or find the minimum sample size for a target power, with multiple testing correction and scenario analysis), and Results (interactive charts, exportable tables, and auto-generated Python replication scripts). ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 16
    sadsa

    sadsa

    SADSA (Software Application for Data Science and Analytics)

    SADSA (Software Application for Data Science and Analytics) is a Python-based desktop application designed to simplify statistical analysis, machine learning, and data visualization for students, researchers, and data professionals. Built using Python for the GUI, SADSA provides a menu-driven interface for handling datasets, applying transformations, running advanced statistical tests, machine learning algorithms, and generating insightful plots — all without writing code.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    SageMaker Inference Toolkit

    SageMaker Inference Toolkit

    Serve machine learning models within a Docker container

    Serve machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. Once you have a trained model, you can include it in a Docker container that runs your inference code. A container provides an effectively isolated environment, ensuring a consistent runtime regardless of where the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Orchest

    Orchest

    Build data pipelines, the easy way

    Code, run and monitor your data pipelines all from your browser! From idea to scheduled pipeline in hours, not days. Interactively build your data science pipelines in our visual pipeline editor. Versioned as a JSON file. Run scripts or Jupyter notebooks as steps in a pipeline. Python, R, Julia, JavaScript, and Bash are supported. Parameterize your pipelines and run them periodically on a cron schedule.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    AWS Step Functions Data Science SDK

    AWS Step Functions Data Science SDK

    For building machine learning (ML) workflows and pipelines on AWS

    The AWS Step Functions Data Science SDK is an open-source library that allows data scientists to easily create workflows that process and publish machine learning models using Amazon SageMaker and AWS Step Functions. You can create machine learning workflows in Python that orchestrate AWS infrastructure at scale, without having to provision and integrate the AWS services separately.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    ML workspace

    ML workspace

    All-in-one web-based IDE specialized for machine learning

    All-in-one web-based development environment for machine learning. The ML workspace is an all-in-one web-based IDE specialized for machine learning and data science. It is simple to deploy and gets you started within minutes to productively built ML solutions on your own machines. This workspace is the ultimate tool for developers preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch, Keras, Sklearn) and dev tools (e.g., Jupyter, VS Code, Tensorboard)...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Data Science Notes

    Data Science Notes

    Curated collection of data science learning materials

    Data Science Notes is a large, curated collection of data science learning materials, with explanations, code snippets, and structured notes across the typical end-to-end workflow. It spans foundational math and statistics through data wrangling, visualization, machine learning, and practical project organization. The content emphasizes hands-on understanding by pairing narrative notes with runnable examples, making it useful for both self-study and classroom settings. Because it aggregates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Forecasting Best Practices

    Forecasting Best Practices

    Time Series Forecasting Best Practices & Examples

    ...Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utilities around processing and featuring the data, optimizing and evaluating models, and scaling up to the cloud. The examples and best practices are provided as Python Jupyter notebooks and R markdown files and a library of utility functions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    SageMaker Containers

    SageMaker Containers

    Create SageMaker-compatible Docker containers

    Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. The SageMaker Training Toolkit can be easily added to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    TensorWatch

    TensorWatch

    Debugging, monitoring and visualization for Python Machine Learning

    TensorWatch is an open source debugging and visualization platform created by Microsoft Research to support machine learning, deep learning, and reinforcement learning workflows. It enables developers to observe training behavior in real time through interactive visualizations, primarily within Jupyter Notebook environments. The tool treats most data interactions as streams, allowing flexible routing, storage, and visualization of metrics generated during model training. A distinctive...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    slycat

    Web-based data science analysis and visualization platform.

    This is Slycat - a web-based data science analysis and visualization platform, created at Sandia National Laboratories. The goal of the Slycat project is to develop processes, tools and techniques to support data science, particularly analysis of large, high-dimensional data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB