Skip to content
Go back

Deque Dojo Part 4: Unleashing the Thieves

Suggest an edit

Part 1 Part 2 Part 3 Part 4

This final chapter completes our journey. We will implement the steal_top function, putting the final gear in place. Then, we will stand back and analyze the machine as a whole, observing the beautiful and intricate dance of memory fences that makes the entire lock-free algorithm safe.

Implementing steal_top: The Thief’s Gambit

A thief’s logic is a mirror image of the owner’s pop. It must also contend for the last item, not just with other thieves, but with the owner. To do this safely, it must participate in the same global ordering protocol established in pop_bottom.

template<typename T>
bool WSDeque<T>::steal_top(T* out) {
    // 1. Load `top`. An acquire load ensures we see the latest value of `top`
    // as updated by other competing thieves.
    std::uint64_t t = m_top.load(std::memory_order_acquire);

    // 2. FENCE: The thief's half of the global ordering agreement. This
    // `seq_cst` fence synchronizes with the fence in `pop_bottom`. It
    // ensures that in the critical race for the last item, the two
    // operations are strictly ordered relative to each other.
    std::atomic_thread_fence(std::memory_order_seq_cst);

    // 3. Load `bottom`. The acquire semantic synchronizes with the owner's
    // `release` store in `push_bottom`, guaranteeing that if we see an
    // increased `bottom`, we also see the item that was written to the
    // buffer.
    std::uint64_t b = m_bottom.load(std::memory_order_acquire);

    if (t < b) {
        // Deque appears to have at least one item.
        // It's safe to read from the buffer now, before the CAS.
        T item = m_buffer[t & m_mask];
        
        // 4. Attempt to claim the item. The `seq_cst` CAS is the final,
        // incorruptible referee. It participates in the single global
        // timeline and ensures that only one operation, this one, another
        // thief's, or the owner's in pop_bottom, can succeed.
        if (m_top.compare_exchange_strong(t, t + 1,
                std::memory_order_seq_cst, std::memory_order_relaxed)) {
            // Success! The item is ours.
            *out = item;
            return true;
        }
    }
    
    // The deque was empty, or we lost the CAS race.
    return false;
}

The Grand Unification: How It All Works Together

Our deque is complete. The owner pushes and pops from the bottom, while thieves steal from the top. Most of the time, they work on opposite ends and never interact. But when the deque is nearly empty, they collide. Let’s analyze how our carefully placed fences make these interactions safe.

Interaction 1: Producer-Consumer (Push ↔ Steal)

This is the simplest case, the classic “Mailbox Handshake.”

Interaction 2: The Multi-Thief Heist (Steal ↔ Steal)

What happens if two thieves try to steal the very last item at the exact same time?

Interaction 3: The Critical Race (Pop ↔ Steal)

This is the heart of the algorithm’s correctness: the owner and a thief race for the last item.

Building the Work-Stealing Pool

Victim Selection & Per-Thread Randomness

A good scheduler needs a smart way for idle threads to choose a victim. Random selection is a robust general-purpose strategy. To avoid creating a new bottleneck, never use a single, shared random number generator with a mutex. The correct solution is a thread_local generator, giving each thread its own private instance with zero contention.

The Art of Idleness: Backoff Strategies

When a thread fails to find work, std::this_thread::yield() is a simple choice. But a better one is exponential backoff:

  1. If a thread fails to find work, it waits for a tiny duration.
  2. If it fails again, it doubles the wait time, up to a maximum.
  3. On success, the wait time resets. This keeps the system responsive when busy and quiet when idle.

Conclusion: The End of the Dojo

Our journey is complete. We started with the simple problem of a shared queue and progressed step-by-step to one of the most elegant and powerful data structures in concurrent programming. You have met the Out of Order Monster, and used the fundamental principles of memory ordering to tame it. The dojo is now yours.


Suggest an edit
Share this post on:

Next Post
Deque Dojo Part 3: The Owner's Lock-Free Fast Path