Godot vs Jolt Parallel Comparison
Written before
The previous article Godot Physics Source Code Analysis analyzed the built-in 3D physics pipeline in Godot. Extending from that, the next topic would be the Jolt physics engine integrated in version 4.4. Initially, I considered writing a separate article introducing it, but upon review, I felt that Godot’s integration of Jolt is relatively simplistic. In terms of functionality, it has been aligned and limited to match Godot’s existing physics capabilities. In terms of performance, an adapter layer was added to route calls back to Godot’s generic thread pool for parallelization. Rather than introducing Jolt within Godot, it might be more insightful to discuss them independently and compare their approaches.
What is the most differentiating aspect among physics engines? In my opinion, it’s not the algorithms (Impulse, PBD) but the parallelization strategies. How constraint islands are constructed and partitioned, and how the thread pool is designed—these directly determine efficiency and can yield drastically different performance across platforms. The following sections will compare these two aspects.
Constraint Construction
Let’s start with the relatively simpler constraint construction, as Godot’s logic here is quite straightforward. As mentioned previously, it’s divided into three distinct logics for Areas, RigidBodies, and SoftBodies. Simply put, the Area part just throws any associated object in (since no computation is performed). RigidBodies detect associated RigidBodies and SoftBodies. SoftBodies only detect associated RigidBodies.
In contrast, Jolt’s handling is more complex, implementing an 8-bit mask graph partitioning logic, where 7 bits are for parallel groups and 1 bit for non-parallel, corresponding to typical 8-core CPUs. To improve parallel efficiency, with dynamic splitting enabled, Jolt splits constraint groups whose constraint count exceeds a split threshold (default: 128). It then checks the split bits; if a split doesn’t reach a merge threshold (default: 16), it’s merged into the non-parallel group; otherwise, a new parallel group is created. The detailed flow is as follows:
1 | bool LargeIslandSplitter::SplitIsland() |
Constraint Solving
Next is the solving stage. As mentioned earlier, this discussion does not involve specific solving algorithms but focuses on thread pool scheduling. Godot uses a generic thread pool with high/low priority queue scheduling, using semaphores + condition variables for waiting; whereas Jolt uses a physics-specific thread pool based on task dependency graph scheduling with barrier waits.
Godot Generic Thread Pool
First, the general workflow of Godot’s thread pool: initialization, task submission, thread execution, cooperative waiting, cleanup. The specific flow is as follows:
1 | 1. Initialization Phase: |
Basic Structure
The basic structure of Godot’s generic thread pool is roughly as follows, using a dual-queue system to balance responsiveness and throughput, and paged allocation to reduce memory fragmentation.
1 | // Class structure and main members |
Cooperative Waiting
The most interesting aspect is likely the cooperative waiting mechanism: worker threads don’t idle while waiting but continue processing other tasks. This is particularly useful for handling task dependencies, effectively preventing deadlocks and improving CPU utilization.
1 | void WorkerThreadPool::_wait_collaboratively(ThreadData *p_caller_pool_thread, Task *p_task) { |
Jolt Physics Thread Pool
Next is Jolt’s physics thread pool. Its general workflow is: initialization, task submission, thread execution, barrier synchronization, cleanup. The specific flow is as follows:
1 | 1. Initialization Phase: |
Basic Structure
The basic structure of Jolt’s physics thread pool is roughly as follows. It uses a ring-based lock-free queue and fixed-size memory pools.
1 | class JobSystemThreadPool final : public JobSystemWithBarrier { |
Barrier Synchronization
In my view, Jolt’s barrier is a physics-specialized version of Godot’s cooperative waiting, suitable for synchronizing physics phases.
The barrier structure is as follows, with read and write pointers aligned to different cache lines to avoid cache invalidation across cores.
1 | class BarrierImpl : public Barrier { |
The logic for adding multiple tasks is as follows. The barrier supports batch task addition to reduce synchronization overhead:
1 | void BarrierImpl::AddJobs(const JobHandle *inHandles, uint inNumHandles) { |
The core waiting mechanism is as follows. It also employs cooperative waiting, balancing efficiency and correctness.
1 | void BarrierImpl::Wait() { |
Summary
By comparing the implementations of physical parallelization between Godot and the original Jolt, we can see clear differences in their design goals. Godot adopts a highly generic thread pool and task system, leaning more towards overall engine consistency and maintainability. In contrast, Jolt builds a dedicated Job System, dependency graph, and barrier mechanisms centered around physics computation itself to maximize parallel efficiency in the physics phase. In Godot’s current integration of Jolt, Jolt’s tasks are ultimately adapted back to Godot’s generic thread pool, which is feasible in terms of functionality but means that the native parallelization design advantages are not fully leveraged.
In comparison, Unreal Engine (Chaos) decomposes physics tasks and integrates them into the engine-level scheduling system via a Task Graph, giving it stronger adaptability across different hardware platforms. This “task-centric rather than thread-centric” design makes it easier to fully utilize multi-core CPUs in complex scenarios. Truly, UE is unmatched. I give up.







