Python Big Data Tools

View 282 business solutions

Browse free open source Python Big Data Tools and projects below. Use the toggles on the left to filter open source Python Big Data Tools by OS, license, language, programming language, and project status.

  • Enterprise-Class Managed File Transfer. Icon
    Enterprise-Class Managed File Transfer.

    For organizations that need to automate secure file transfers to protect sensitive data.

    Diplomat MFT by Coviant Software is a secure, reliable managed file transfer solution designed to simplify and automate SFTP, FTPS, and HTTPS file transfers. Built for seamless integration, Diplomat MFT works across major cloud storage platforms, including AWS S3, Azure Blob, Google Cloud, Oracle Cloud, SharePoint, Dropbox, Box, and more.
    Learn More
  • Download the most trusted enterprise browser Icon
    Download the most trusted enterprise browser

    Chrome Enterprise brings enterprise controls and easy integrations to the browser users already know and love.

    Chrome Enterprise is ideal for businesses of all sizes, IT professionals, and organizations looking for a secure, scalable, and easily managed browser solution that supports remote work, data protection, and streamlined enterprise operations.
    Learn More
  • 1
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building block for doing practical, real world data analysis in Python, as well as powerful and flexible open source data analysis/ manipulation tool for any language.
    Downloads: 122 This Week
    Last Update:
    See Project
  • 2
    marimo

    marimo

    A reactive notebook for Python

    marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. Version with git, run as Python scripts, import symbols from a notebook into other notebooks or Python files, and lint or format with your favorite tools. You'll always be able to reproduce your collaborators' results. Notebooks are executed in a deterministic order, with no hidden state, delete a cell and marimo deletes its variables while updating affected cells.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    fooltrader

    fooltrader

    Quant framework for stock

    Build a standard data schema, and then implement various connectors to import systems you are familiar with for analysis. fooltrader is a quantitative analysis trading system designed using big data technology, including data capture, cleaning, structuring, calculation, display, backtesting and trading. Its goal is to provide a unified framework for the whole market (stock, futures, bonds, foreign exchange, digital currency, macroeconomics, etc.) for research, backtesting, forecasting, and trading. Its applicable objects include quantitative traders, teachers, and students majoring in finance, people interested in economic data, programmers, and people who like freedom and the spirit of exploration. You could write the Strategy using an event-driven or time walkway and view and analyze the performance in a uniform way.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    gravitino

    gravitino

    Unified metadata lake for data & AI assets.

    Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Polygon Software | Apparel Software | PLM and ERP Solutions Icon
    Polygon Software | Apparel Software | PLM and ERP Solutions

    Small to mid-sized sewn goods manufacturers and textile mills.

    PolyPM is an integrated enterprise resource planning (ERP) and product lifecycle management (PLM) solution developed by Polygon Software. Built for small to medium-sized apparel manufacturers, PolyPM enables businesses to integrate all aspects of the product development, supply chain and production processes, as well as instantly access all their style and manufacturing information anywhere in the world. This allows businesses to shorten time-to-market, incur lower development costs, and improve customer service and worker productivity.
    Learn More
  • 5

    Augustus

    PMML-compliant scoring engine and analytic toolkit

    Augustus development has moved to google code. The new project page is augustus.googlecode.com. New releases of the project are not currently being released to sourceforge. Augustus is designed for statistical and data mining models and produces and consumes models with 10,000s of segments. Versions of Augustus support PMML 3, 4.0.1, and 4.1.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Old File Delete

    Old File Delete

    Clean up old files with a single click.

    OldFileDelete (OFD) is a lightweight and efficient utility designed for those who value minimalism and order. The app helps you instantly clear selected folders of accumulated digital clutter. Featuring a modern flat design, the interface is intuitive: simply select a folder, specify the number of days, and the program will find and remove outdated files. No complex settings—just cleanliness and speed.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy. License: R, RStudio, NLTK, SciPy, SKLearn, MatPlotLib, Weka, ... each has their own licenses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Modin

    Modin

    Scale your Pandas workflows by changing a single line of code

    Scale your pandas workflow by changing a single line of code. Modin uses Ray, Dask or Unidist to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. It is not necessary to know in advance the available hardware resources in order to use Modin. Additionally, it is not necessary to specify how to distribute or place data. Modin acts as a drop-in replacement for pandas, which means that you can continue using your previous pandas notebooks, unchanged, while experiencing a considerable speedup thanks to Modin, even on a single machine. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Vaex

    Vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python

    Data science solutions, insights, dashboards, machine learning, deployment. We start at 100GB. Vaex is a high-performance Python library for lazy Out-of-Core data frames (similar to Pandas), to visualize and explore big tabular datasets. It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) samples/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). Cut development cut development time by 80%. Your prototype is your solution. Create automatic pipelines for any model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • PairSoft | AP Automation and Doc Management Icon
    PairSoft | AP Automation and Doc Management

    Free your team from manual processes.

    Streamline operations and elevate your team's efficiency with PairSoft. Our AP automation, procurement, and document management solutions eliminate manual processes, cut costs, and free your team to focus on strategic initiatives. Experience our state-of-the-art invoice-to-pay solution, now integrated with advanced AI technology for faster, smarter results. Our customers report a significant 70% reduction in approval times and annual savings of $62,000 in employee hours. At PairSoft, we aim to transform your business operations through automation. Explore the future of automation at pairsoft.com, where you can leverage cutting-edge features like invoice capture, OCR, and comprehensive AP automation to transform your workflow. Whether you are a small business or a large enterprise, our solutions are designed to scale with your needs, providing robust functionality and ease of use. Join the growing number of businesses that trust PairSoft.
    Learn More
  • 10
    abu

    abu

    Abu quantitative trading system (stocks, options, futures, bitcoin)

    Abu Quantitative Integrated AI Big Data System, K-Line Pattern System, Classic Indicator System, Trend Analysis System, Time Series Dimension System, Statistical Probability System, and Traditional Moving Average System conduct in-depth quantitative analysis of investment varieties, completely crossing the user's complex code quantification stage, more suitable for ordinary people to use, towards the era of vectorization 2.0. The above system combines hundreds of seed quantitative models, such as financial time series loss model, deep pattern quality assessment model, long and short pattern combination evaluation model, long pattern stop-loss strategy model, short pattern covering strategy model, big data K-line pattern Historical portfolio fitting model, trading position mentality model, dopamine quantification model, inertial residual resistance support model, long-short swap revenge probability model, strong and weak confrontation model, trend angle change rate model, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB