Showing 500 open source projects for "java open source"

View related business solutions
  • 1
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    ...pandas is continuously being developed to be a fundamental high-level building block for doing practical, real world data analysis in Python, as well as powerful and flexible open source data analysis/ manipulation tool for any language.
    Downloads: 120 This Week
    Last Update:
    See Project
  • 2
    scikit-learn

    scikit-learn

    Machine learning in Python

    scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3
    CKAN

    CKAN

    CKAN is an open-source DMS for powering data hubs

    CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work with data. It's a data management system that provides a powerful platform for cataloging, storing and accessing datasets with a rich front-end, full API (for both data and catalog), visualization tools and more.CKAN is used by national and regional government organizations throughout the European Union, the Americas, Asia, and Oceania to power a variety of official and community data portals. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 4
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 5
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 6
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL).
    Downloads: 12 This Week
    Last Update:
    See Project
  • 7
    Arize Phoenix

    Arize Phoenix

    Uncover insights, surface problems, monitor, and fine tune your LLM

    Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    Orange Data Mining

    Orange Data Mining

    Orange: Interactive data analysis

    Open source machine learning and data visualization. Build data analysis workflows visually, with a large, diverse toolbox. Perform simple data analysis with clever data visualization. Explore statistical distributions, box plots and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS and linear projections.
    Downloads: 64 This Week
    Last Update:
    See Project
  • 9
    Airbyte

    Airbyte

    Data integration platform for ELT pipelines from APIs, databases

    We believe that only an open-source solution to data movement can cover the long tail of data sources while empowering data engineers to customize existing connectors. Our ultimate vision is to help you move data from any source to any destination. Airbyte already provides the largest catalog of 300+ connectors for APIs, databases, data warehouses, and data lakes.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 10
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    Pathway is an open-source framework designed for building real-time data applications using reactive and declarative paradigms. It enables seamless integration of live data streams and structured data into analytical pipelines with minimal latency. Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    Dagster

    Dagster

    An orchestration platform for the development, production

    Dagster is an orchestration platform for the development, production, and observation of data assets. Dagster as a productivity platform: With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early. Dagster as a robust orchestration engine: Put your pipelines into production with a robust...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 12
    Union Pandera

    Union Pandera

    Light-weight, flexible, expressive statistical data testing library

    The open-source framework for precision data testing for data scientists and ML engineers. Pandera provides a simple, flexible, and extensible data-testing framework for validating not only your data but also the functions that produce them. A simple, zero-configuration data testing framework for data scientists and ML engineers seeking correctness.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    GemGIS

    GemGIS

    Spatial data processing for geomodeling

    GemGIS is a Python-based, open-source geographic information processing library. It is capable of preprocessing spatial data such as vector data (shape files, geojson files, geopackages,…), raster data (tif, png,…), data obtained from online services (WCS, WMS, WFS) or XML/KML files (soon). Preprocessed data can be stored in a dedicated Data Class to be passed to the geomodeling package GemPy in order to accelerate the model-building process.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    JILL.py

    JILL.py

    A cross-platform installer for the Julia programming language

    The enhanced Python fork of JILL, Julia Installer for Linux (and every other platform), Light.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    ydata-profiling

    ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

    ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    folium

    folium

    Python data, Leaflet.js maps

    folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium. folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. The library has a number of...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    Panda-Helper

    Panda-Helper

    Panda-Helper: Data profiling utility for Pandas DataFrames and Series

    Panda-Helper is a simple data-profiling utility for Pandas DataFrames and Series. Assess data quality and usefulness with minimal effort. Quickly perform initial data exploration, so you can move on to more in-depth analysis.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    Datachain enables multimodal API calls and local AI inferences to run in parallel over many samples as chained operations. The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    Population Shift Monitoring

    Population Shift Monitoring

    Monitor the stability of a Pandas or Spark dataframe

    popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets. popmon creates histograms of features binned in time-slices, and compares the stability of the profiles and distributions of those histograms using statistical tests, both over time and with respect to a reference. It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    FiftyOne

    FiftyOne

    The open-source tool for building high-quality datasets

    The open-source tool for building high-quality datasets and computer vision models. Nothing hinders the success of machine learning systems more than poor-quality data. And without the right tools, improving a model can be time-consuming and inefficient. FiftyOne supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    Astropy

    Astropy

    Repository for the Astropy core package

    The Astropy Project is a community effort to develop a common core package for Astronomy in Python and foster an ecosystem of interoperable astronomy packages. Astropy is a Python library for use in astronomy. Learn Astropy provides a portal to all of the Astropy educational material through a single dynamically searchable web page. It allows you to filter tutorials by keywords, search for filters, and make search queries in tutorials and documentation simultaneously. The Anaconda Python...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    Encord Active

    Encord Active

    The toolkit to test, validate, and evaluate your models and surface

    Encord Active is an open-source toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling to supercharge model performance. Encord Active has been designed as a all-in-one open source toolkit for improving your data quality and model performance. Use the intuitive UI to explore your data or access all the functionalities programmatically.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    Datumaro is a flexible Python-based dataset management framework and command-line tool for building, analyzing, transforming, and converting computer vision datasets in many popular formats. It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools. Datumaro makes it easy to merge datasets, split them into...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Dash

    Dash

    Build beautiful web-based analytic apps, no JavaScript required

    Dash is a Python framework for building beautiful analytical web applications without any JavaScript. Built on top of Plotly.js, React and Flask, Dash easily achieves what an entire team of designers and engineers normally would. It ties modern UI controls and displays such as dropdown menus, sliders and graphs directly to your analytical Python code, and creates exceptional, interactive analytics apps. Dash apps are very lightweight, requiring only a limited number of lines of Python or...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    Dask

    Dask

    Parallel computing with task scheduling

    Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB