parallel free download

Showing 352 open source projects for "parallel"

View related business solutions

C++ Clear Filters & Widen Search

Award-Winning Medical Office Software Designed for Your Specialty
Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.

Learn More
Premier Construction Software
Premier is a global leader in financial construction ERP software.

Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.

Learn More
1

EnvPool

C++-based high-performance parallel environment execution engine

EnvPool is a fast, asynchronous, and parallel RL environment library designed for scaling reinforcement learning experiments. Developed by SAIL at Singapore, it leverages C++ backend and Python frontend for extremely high-speed environment interaction, supporting thousands of environments running in parallel on a single machine. It's compatible with Gymnasium API and RLlib, making it suitable for scalable training pipelines.

Downloads: 11 This Week

Last Update: 2026-04-14
See Project
2

CUDA Core Compute Libraries (CCCL)

CUDA Core Compute Libraries

...It brings together Thrust, CUB, and libcudacxx, which collectively provide high-level abstractions, low-level performance primitives, and a CUDA-compatible standard library for GPU programming. The goal of CCCL is to simplify CUDA development by offering reusable building blocks that enable developers to write efficient and scalable parallel code without starting from scratch. Thrust provides a high-level interface for parallel algorithms, while CUB delivers highly optimized primitives for device-level operations, and libcudacxx ensures compatibility with modern C++ standards. By unifying these components, CCCL reduces duplication and improves developer productivity while maintaining performance across different GPU architectures.

Downloads: 11 This Week

Last Update: 2 days ago
See Project
3

LightGBM

Gradient boosting framework based on decision tree algorithms

LightGBM or Light Gradient Boosting Machine is a high-performance, open source gradient boosting framework based on decision tree algorithms. Compared to other boosting frameworks, LightGBM offers several advantages in terms of speed, efficiency and accuracy. Parallel experiments have shown that LightGBM can attain linear speed-up through multiple machines for training in specific settings, all while consuming less memory. LightGBM supports parallel and GPU learning, and can handle large-scale data. It’s become widely-used for ranking, classification and many other machine learning tasks.

Downloads: 1 This Week

Last Update: 2025-02-15
See Project
4

FreeFileSync

Folder comparison and synchronization software

FreeFileSync is a folder comparison and synchronization software that creates and manages backup copies of all your important files. Instead of copying every file every time, FreeFileSync determines the differences between a source and a target folder and transfers only the minimum amount of data needed. FreeFileSync is Open Source software, available for Windows, macOS, and Linux.

1 Review

Downloads: 69 This Week

Last Update: 2026-03-29
See Project
Field Sales+ for MS Dynamics 365 and Salesforce
Maximize your sales performance on the go.

Bring Dynamics 365 and Salesforce wherever you go with Resco’s solution. With powerful offline features and reliable data syncing, your team can access CRM data on mobile devices anytime, anywhere. This saves time, cuts errors, and speeds up customer visits.

Learn More
5

CGraph

A general, three-party dependency-free, cross-platform

CGraph is a high-performance, cross-platform Directed Acyclic Graph (DAG) framework implemented in pure C++ with no third-party dependencies, designed for building complex task pipelines and parallel execution workflows. It allows developers to model computational processes as graph structures, where nodes represent tasks and edges define dependencies, enabling efficient scheduling and execution. The framework includes a pipeline system that supports sequential and parallel execution, conditional branching, aggregation, and loop control, making it highly flexible for advanced workflows. ...

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
6

Sogou C++ Workflow

C++ parallel computing and asynchronous networking engine

As Sogou`s C++ server engine, Sogou C++ Workflow supports almost all back-end C++ online services of Sogou, including all search services, cloud input method, online advertisements, etc., handling more than 10 billion requests every day. This is an enterprise-level programming engine in light and elegant design which can satisfy most C++ back-end development requirements.

Downloads: 0 This Week

Last Update: 2026-02-09
See Project
7

Soufflé

Datalog variant for tool designers crafting analyses in Horn clauses

Rapid prototyping for your analysis problems with logic; enabling deep design-space explorations; designed for large-scale static analysis; e.g., points-to analysis for Java, taint-analysis, and security checks. Futamura projections/partial evaluation for effective translation to parallel C++; optimized staged compilation; specialized data-structures for logical relations. Efficient translation to parallel C++ of Datalog programs (CAV'16, CC'16) Efficient interpretation using de-specialization techniques (PLDI'21) Specialized data structure for relations (PACT'19, PPoPP'19, PMAM'19) with optimal index selection (VLDB'18) Extended semantics of Datalog, e.g., permitting unbounded recursions with numbers and terms. ...

Downloads: 0 This Week

Last Update: 2025-03-24
See Project
8

nndeploy

An Easy-to-Use and High-Performance AI Deployment Framework

...The system supports multiple inference engines and hardware accelerators, allowing the same AI workflow to run on different platforms without significant modifications. nndeploy also includes performance optimization techniques such as parallel execution, memory reuse, and hardware-accelerated operations to improve inference speed.

Downloads: 1 This Week

Last Update: 2026-04-04
See Project
9

KuzuDB

Embeddable property graph database management system

KuzuDB is a high-performance graph database optimized for analytical queries, built from the ground up with a columnar storage engine. It is designed to efficiently process large-scale graph workloads, making it ideal for data science, machine learning, and knowledge graph applications.

Downloads: 4 This Week

Last Update: 2025-10-10
See Project
Simplify Purchasing For Your Business
Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.

Learn More
10

ncnn

High-performance neural network inference framework for mobile

ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...

Downloads: 87 This Week

Last Update: 2026-01-13
See Project
11

ispc

Intel SPMD Program Compiler

...Under the SPMD model, the programmer writes a program that generally appears to be a regular serial program, though the execution model is actually that a number of program instances execute in parallel on the hardware. ispc compiles a C-based SPMD programming language to run on the SIMD units of CPUs and GPUs; it frequently provides a 3x or more speedup on architectures with 4-wide vector SSE units and 5x-6x on architectures with 8-wide AVX vector units, without any of the difficulty of writing intrinsics code. Parallelization across multiple cores is also supported by ispc, making it possible to write programs that achieve performance improvement that scales by both numbers of cores and vector unit size. ...

Downloads: 56 This Week

Last Update: 2026-02-04
See Project
12

TensorStore

Library for reading and writing large multi-dimensional arrays

...It separates the logical view (shape, dtype, chunking) from the physical layout so the same code can target Zarr, N5, TIFF pyramids, or custom backends. Rich indexing, slicing, and broadcasting operations make it feel like a familiar array API, while asynchronous I/O pipelines stream chunks efficiently in parallel. Transactional semantics allow atomic updates and consistent snapshots, which is essential for large, shared datasets used by ML and scientific workflows. The library is engineered for scalability—background caching, chunk sharding, and retryable operations keep throughput high even over unreliable networks. With language bindings, it fits into Python-heavy analysis pipelines while retaining a fast C++ core.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
13

Halide

A language for fast, portable data-parallel computation

Halide is a programming language for fast, portable data-parallel computation. It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. ...

Downloads: 0 This Week

Last Update: 2025-09-17
See Project
14

Snapcast

Synchronous multiroom audio player

...It's not a standalone player, but an extension that turns your existing audio player into a Sonos-like multiroom solution. Audio is captured by the server and routed to the connected clients. Several players can feed audio to the server in parallel and clients can be grouped to play the same audio stream. One of the most generic ways to use Snapcast is in conjunction with the music player daemon (MPD) or Mopidy. The encoded chunks are sent via a TCP connection to the Snapclients. Each client does continuous time synchronization with the server, so that the client is always aware of the local server time. ...

Downloads: 29 This Week

Last Update: 2026-03-10
See Project
15

CTranslate2

Fast inference engine for Transformer models

CTranslate2 is a C++ and Python library for efficient inference with Transformer models. The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU. The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many...

Downloads: 13 This Week

Last Update: 2026-02-04
See Project
16

ChrysaLisp

Parallel OS, with GUI, Terminal, OO Assembler, Class libraries

ChrysaLisp is a 64-bit, MIMD, multi-CPU, multi-threaded, multi-core, multi-user parallel operating system with features such as a GUI, terminal, OO Assembler, class libraries, C-Script compiler, Lisp interpreter, debugger, profiler, vector font engine, and more. It supports MacOS, Windows, and Linux for x64, Riscv64, and Arm64 and eventually will move to bare metal. It also allows the modeling of various network topologies and the use of ChrysaLib hub nodes to join heterogeneous host networks. ...

Downloads: 0 This Week

Last Update: 2026-04-11
See Project
17

ClickHouse

A fast open-source OLAP database management system

ClickHouse® is a fast, open-source column-oriented database management system that can generate analytical data reports through SQL queries in real time. According to several independent benchmarks, it far exceeds other comparable column-oriented database management systems, working even up to 1000 times faster. It is able to process hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second. Apart from its blazing speed, ClickHouse is highly...

Downloads: 23 This Week

Last Update: 4 days ago
See Project
18

SIMD

C++ wrappers for SIMD intrinsics

SIMD is a C++ library that provides portable abstractions over SIMD (Single Instruction, Multiple Data) instructions, enabling developers to write high-performance vectorized code without dealing directly with architecture-specific intrinsics. SIMD instructions allow a single operation to be applied to multiple data elements simultaneously, significantly accelerating numerical and data-parallel computations. However, differences across CPU architectures and compilers make direct usage complex, which xsimd addresses by offering a unified API that maps efficiently to underlying hardware capabilities. The library supports a wide range of instruction sets, including SSE, AVX, NEON, and WebAssembly SIMD, ensuring portability across platforms. ...

Downloads: 5 This Week

Last Update: 2026-04-12
See Project
19

TensorRT

C++ library for high performance inference on NVIDIA GPUs

...With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. TensorRT is built on CUDA®, NVIDIA’s parallel programming model, and enables you to optimize inference leveraging libraries, development tools, and technologies in CUDA-X™ for artificial intelligence, autonomous machines, high-performance computing, and graphics. With new NVIDIA Ampere Architecture GPUs, TensorRT also leverages sparse tensor cores providing an additional performance boost.

Downloads: 12 This Week

Last Update: 2026-03-25
See Project
20

ROOT

Analyzing, storing and visualizing big data, scientifically

...ROOT comes with histogramming capabilities in an arbitrary number of dimensions, curve fitting, statistical modeling, and minimization, to allow the easy setup of a data analysis system that can query and process the data interactively or in batch mode, as well as a general parallel processing framework, RDataFrame, that can considerably speed up an analysis.

Downloads: 3 This Week

Last Update: 2026-03-14
See Project
21

XGBoost

Scalable and Flexible Gradient Boosting

...It supports regression, classification, ranking and user defined objectives, and runs on all major operating systems and cloud platforms. XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems. XGBoost can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.

Downloads: 3 This Week

Last Update: 2026-02-10
See Project
22

PaddlePaddle

PArallel Distributed Deep LEarning: Machine Learning Framework

PaddlePaddle is an open source deep learning industrial platform with advanced technologies and a rich set of features that make innovation and application of deep learning easier. It is the only independent R&D deep learning platform in China, and has been widely adopted in various sectors including manufacturing, agriculture and enterprise service. PaddlePaddle covers core deep learning frameworks, basic model libraries, end-to-end development kits and more, with support for both...

Downloads: 1 This Week

Last Update: 2026-01-31
See Project
23

mold

A Modern Linker

Mold is a modern high-performance linker designed as a drop-in replacement for traditional Unix linkers, with a primary goal of dramatically reducing build times for large software projects. In compiled languages like C, C++, and Rust, the linking phase can become a significant bottleneck, especially in large codebases, and mold addresses this by leveraging highly optimized algorithms and extensive parallelism. It is capable of utilizing all available CPU cores efficiently, resulting in...

Downloads: 2 This Week

Last Update: 2026-04-13
See Project
24

ArrayFire

ArrayFire, a general purpose GPU library

ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs, and other hardware acceleration devices. The library serves users in every technical computing market. Data structures in ArrayFire are smartly managed to avoid costly memory transfers and to take advantage of each performance feature provided by the underlying hardware. The community of ArrayFire developers invites you to build with us if you're interested and able to write top performing tensor functions. ...

Downloads: 1 This Week

Last Update: 2025-09-05
See Project
25

fairseq2

FAIR Sequence Modeling Toolkit 2

...It supports multi-GPU and multi-node distributed training using DDP, FSDP, and tensor parallelism, capable of scaling up to 70B+ parameter models. The framework integrates seamlessly with PyTorch 2.x features such as torch.compile, Fully Sharded Data Parallel (FSDP), and modern configuration management.

Downloads: 1 This Week

Last Update: 2026-03-26
See Project