
Solving the ABA Problem in Rust with Hazard Pointers
- Introduction
Introduction
In our journey exploring solutions to the ABA problem in Rust, we’ve covered tagged pointers and epoch-based reclamation. In this third and final post of the series, we’ll examine hazard pointers – a technique that provides fine-grained protection for individual memory locations.
đź“ş Series Overview
This is the final post in our three-part series on solving the ABA problem in Rust:
- ✅ Part 1: Tagged Pointers with Versioning – We covered how to pair pointers with version numbers
- ✅ Part 2: Epoch-Based Reclamation – We explored using epochs for safe memory reclamation
- 🎯 Part 3: Hazard Pointers – Today’s post on using hazard pointers for precise memory protection
🔍 What Are Hazard Pointers?
Hazard pointers are a memory reclamation technique that protects specific memory addresses from being recycled while they’re in use. Unlike epoch-based reclamation, which protects all shared memory during an epoch, hazard pointers protect only explicitly marked locations.
Key concepts:
- Pointer Registration: Threads explicitly register pointers they’re currently using
- Per-Thread Protection: Each thread maintains its own list of hazard pointers
- Selective Reclamation: Memory is only reclaimed when no thread has registered it as hazardous
- Retirement Queue: Memory scheduled for deletion is first moved to a retirement queue
🛡️ How Hazard Pointers Work
Hazard pointers work through these key mechanisms:
- Declaration: Before accessing a shared pointer, a thread publishes it as a hazard pointer
- Validation: After publishing, the thread verifies the pointer is still valid
- Access: The thread can safely access the pointer’s data
- Retirement: When memory is no longer needed, it’s queued for deletion
- Scanning: Before reclaiming memory, all threads’ hazard pointer lists are scanned
- Reclamation: Only memory not present in any hazard list is actually freed
The diagram illustrates the step-by-step process of how hazard pointers work to protect memory from premature reclamation:
Thread 2 Protection Phase:
- Thread 2 sets a hazard pointer to memory location A
- Thread 2 reads from A safely
- Memory status shows A is now protected
Thread 1 Removal Phase:
- Thread 1 removes A from the primary data structure
- Thread 1 moves A to a retirement queue
- Memory status shows A is now in retirement queue, but not yet freed
First Reclamation Attempt:
- Thread 1 scans all hazard pointer lists
- Thread 1 sees A is marked as hazardous by Thread 2
- Memory status shows A remains in the retirement queue
Protection Release:
- Thread 2 finishes its work and clears its hazard pointer to A
- Memory status is updated to reflect this change
Final Reclamation:
- Thread 1 scans hazard lists again
- Thread 1 sees A is no longer hazardous
- Memory status shows A can now be reclaimed Thread 1 safely frees memory location A
⚙️ Implementation in Rust
Let’s implement a lock-free stack with hazard pointers:
use std::collections::HashSet;
use std::fmt;
use std::ptr;
use std::sync::atomic::{AtomicPtr, AtomicUsize, Ordering};
use std::sync::{Arc, Mutex};
use std::thread::{self, ThreadId};
First, let’s define our HazardPointers
type:
/// A thread-local hazard pointer registry
///
/// This struct maintains a list of pointers that a thread is currently using,
/// protecting them from being reclaimed by other threads.
pub struct HazardPointers<T> {
/// Map from thread ID to list of hazard pointers
thread_hazards: Mutex<Vec<(ThreadId, *mut T)>>,
/// Global retirement list of nodes awaiting safe reclamation
retire_list: Mutex<Vec<*mut T>>,
}
// Safety: HazardPointers can be safely shared between threads because
// all its mutations are protected by internal mutexes
unsafe impl<T> Send for HazardPointers<T> {}
unsafe impl<T> Sync for HazardPointers<T> {}
Now let’s implement the core functionality for managing hazard pointers:
impl<T> HazardPointers<T> {
/// Creates a new hazard pointer registry
pub fn new() -> Self {
HazardPointers {
thread_hazards: Mutex::new(Vec::new()),
retire_list: Mutex::new(Vec::new()),
}
}
/// Registers a hazard pointer for the current thread
///
/// This protects the given pointer from being reclaimed by other threads
/// until explicitly cleared with clear_hazards().
pub fn protect(&self, ptr: *mut T) -> *mut T {
if !ptr.is_null() {
let thread_id = thread::current().id();
let mut hazards = self
.thread_hazards
.lock()
.expect("Failed to lock hazard list - mutex poisoned");
// Check if we already have an entry for this thread
for entry in hazards.iter_mut() {
if entry.0 == thread_id {
entry.1 = ptr;
return ptr;
}
}
// No existing entry, add a new one
hazards.push((thread_id, ptr));
}
ptr
}
/// Clears all hazard pointers for the current thread
///
/// This should be called when the thread no longer needs to access
/// previously protected pointers.
pub fn clear_hazards(&self) {
let thread_id = thread::current().id();
let mut hazards = self
.thread_hazards
.lock()
.expect("Failed to lock hazard list - mutex poisoned");
hazards.retain(|entry| entry.0 != thread_id);
}
}
Next, we implement memory retirement and reclamation:
/// Adds a pointer to the retirement list for later reclamation
///
/// The memory will be reclaimed when it's safe to do so (i.e., when no thread
/// has it marked as hazardous).
pub fn retire(&self, ptr: *mut T) {
if !ptr.is_null() {
let mut retire = self
.retire_list
.lock()
.expect("Failed to lock retire list - mutex poisoned");
retire.push(ptr);
// Attempt to reclaim memory if retire list is getting large
if retire.len() > 10 {
self.try_reclaim(false);
}
}
}
/// Attempts to reclaim memory from the retirement list
///
/// This scans all hazard pointers across all threads and only reclaims
/// memory that isn't protected by any thread.
pub fn try_reclaim(&self, force: bool) -> usize {
// Get the current set of hazardous pointers
let hazards = self
.thread_hazards
.lock()
.expect("Failed to lock hazard list - mutex poisoned");
let hazardous: HashSet<*mut T> = hazards.iter().map(|entry| entry.1).collect();
// Get the retirement list
let mut retire = self
.retire_list
.lock()
.expect("Failed to lock retire list - mutex poisoned");
// If the retire list is empty or too small and we're not forcing reclamation, do nothing
if retire.is_empty() || (!force && retire.len() <= 5) {
return 0;
}
// Separate nodes that are safe to reclaim from those that are still hazardous
let (to_free, still_hazardous): (Vec<*mut T>, Vec<*mut T>) =
retire.drain(..).partition(|ptr| !hazardous.contains(ptr));
// Update the retirement list with nodes that couldn't be freed yet
*retire = still_hazardous;
// Count how many nodes we freed
let freed_count = to_free.len();
// Free the safe nodes
for ptr in to_free {
unsafe {
let _ = Box::from_raw(ptr);
}
}
freed_count
}
}
impl<T> Drop for HazardPointers<T> {
fn drop(&mut self) {
// Final reclamation attempt to free everything
self.try_reclaim(true);
// If there are still pointers in the retire list, that means they're
// still protected by some thread, which is a bug (memory leak)
let retire = self
.retire_list
.lock()
.expect("Failed to lock retire list - mutex poisoned");
if !retire.is_empty() {
// Just log a warning in a real application you might want to panic
eprintln!("Warning: HazardPointers dropped with {} items still in retire list. This is a memory leak.", retire.len());
}
}
}
Now let’s define the Node
type for our lock-free stack:
/// A node in our lock-free stack
pub struct Node<T> {
/// The value stored in this node
pub value: T,
/// Pointer to the next node in the stack
pub next: *mut Node<T>,
}
impl<T: fmt::Debug> fmt::Debug for Node<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("Node")
.field("value", &self.value)
.field("next", &self.next)
.finish()
}
}
Let’s define the actual LockFreeStack
structure:
/// A lock-free stack using hazard pointers for memory reclamation
///
/// This implementation is thread-safe and prevents the ABA problem
/// through the use of hazard pointers.
pub struct LockFreeStack<T> {
/// Atomic pointer to the head of the stack
pub head: AtomicPtr<Node<T>>,
/// Hazard pointer registry used to protect nodes from reclamation
pub hazard_pointers: Arc<HazardPointers<Node<T>>>,
/// Counter tracking the current size of the stack
size: AtomicUsize,
/// Whether to print debug information
verbose: bool,
}
Now let’s implement the stack operations:
impl<T> LockFreeStack<T> {
/// Creates a new empty stack
pub fn new(verbose: bool) -> Self {
LockFreeStack {
head: AtomicPtr::new(ptr::null_mut()),
hazard_pointers: Arc::new(HazardPointers::new()),
size: AtomicUsize::new(0),
verbose,
}
}
/// Pushes a value onto the stack
pub fn push(&self, value: T) -> Result<(), String> {
// Create a new node
let new_node = Box::into_raw(Box::new(Node {
value,
next: ptr::null_mut(),
}));
loop {
// Get the current head with Acquire ordering
let current_head = self.head.load(Ordering::Acquire);
// Point our new node to the current head
unsafe {
(*new_node).next = current_head;
}
if self.verbose {
println!(
"Attempting to push node: {:p} with next pointing to: {:p}",
new_node, current_head
);
}
// Try to update the head to our new node
match self.head.compare_exchange(
current_head,
new_node,
Ordering::Release,
Ordering::Relaxed,
) {
Ok(_) => {
// Successfully pushed the node
self.size.fetch_add(1, Ordering::Relaxed);
if self.verbose {
println!("Successfully pushed node: {:p}", new_node);
}
return Ok(());
}
Err(actual_head) => {
// Failed to push, try again with the updated head
if self.verbose {
println!(
"Push conflict detected! Expected head: {:p}, actual head: {:p}",
current_head, actual_head
);
}
unsafe {
(*new_node).next = actual_head;
}
}
}
}
}
}
The pop operation is the most important part for ABA prevention:
/// Pops a value from the stack
pub fn pop(&self) -> Option<T> {
loop {
// Get the current head
let current_head = self.head.load(Ordering::Acquire);
if current_head.is_null() {
// Stack is empty
if self.verbose {
println!("Stack is empty, cannot pop");
}
return None;
}
if self.verbose {
println!("Attempting to pop head: {:p}", current_head);
}
// CRITICAL STEP 1: Mark this pointer as hazardous before accessing it
let protected_head = self.hazard_pointers.protect(current_head);
// CRITICAL STEP 2: Check if the head has changed since we loaded it
// This is crucial for ABA prevention - if head changed, retry
if self.head.load(Ordering::Acquire) != current_head {
if self.verbose {
println!("Head changed during protection, retrying pop");
}
continue;
}
// Now safe to access the pointer
let next = unsafe { (*protected_head).next };
// Try to update the head to the next node
match self.head.compare_exchange(
current_head,
next,
Ordering::Release,
Ordering::Relaxed,
) {
Ok(_) => {
// Successfully popped the node
let value = unsafe {
// Move out the value
let v = std::ptr::read(&(*protected_head).value);
v
};
self.size.fetch_sub(1, Ordering::Relaxed);
if self.verbose {
println!(
"Successfully popped head: {:p}, new head: {:p}",
protected_head, next
);
}
// Clear hazard pointer and schedule node for reclamation
self.hazard_pointers.clear_hazards();
self.hazard_pointers.retire(protected_head);
return Some(value);
}
Err(_) => {
// Failed to pop, retry
if self.verbose {
println!("Pop conflict detected! Head changed during CAS");
}
continue;
}
}
}
}
/// Returns the current size of the stack
pub fn len(&self) -> usize {
self.size.load(Ordering::Relaxed)
}
/// Returns true if the stack is empty
pub fn is_empty(&self) -> bool {
self.len() == 0
}
}
/// Clean up resources when the stack is dropped
impl<T> Drop for LockFreeStack<T> {
fn drop(&mut self) {
// Pop all elements to ensure memory is freed
while self.pop().is_some() {}
// Final reclamation attempt
self.hazard_pointers.try_reclaim(true);
}
}
Let’s demonstrate the ABA problem prevention with a simpler example:
use std::time::Duration;
fn aba_demonstration() {
println!("\nDemonstrating ABA problem prevention with hazard pointers...");
let stack = Arc::new(LockFreeStack::new(true));
// Initial state: Push 1, 2, 3
stack.push(1).expect("Push should succeed");
stack.push(2).expect("Push should succeed");
stack.push(3).expect("Push should succeed");
println!("Initial stack state: [3] → [2] → [1]");
let stack_clone1 = Arc::clone(&stack);
let stack_clone2 = Arc::clone(&stack);
// Thread 1: Will try to pop 3 and then get delayed
let handle1 = thread::spawn(move || {
println!("Thread 1: Starting operation");
// Start the pop operation but pause in the middle
let head = stack_clone1.head.load(Ordering::Acquire);
stack_clone1.hazard_pointers.protect(head);
println!("Thread 1: Protected head (value 3)");
// Simulate delay
thread::sleep(Duration::from_millis(200));
// Try to complete the pop operation
println!("Thread 1: Continuing operation after delay");
let result = stack_clone1.pop();
println!("Thread 1: Pop result: {:?}", result);
});
// Thread 2: Will perform operations while Thread 1 is sleeping
let handle2 = thread::spawn(move || {
thread::sleep(Duration::from_millis(50));
println!("Thread 2: Performing operations while Thread 1 is delayed");
// Pop 3
let val = stack_clone2.pop().expect("Stack should have value 3");
println!("Thread 2: Popped {}", val);
// Pop 2
let val = stack_clone2.pop().expect("Stack should have value 2");
println!("Thread 2: Popped {}", val);
// Push 3 again - this would cause ABA without hazard pointers!
stack_clone2.push(3).expect("Push should succeed");
println!("Thread 2: Pushed 3 back");
});
handle1.join().expect("Thread 1 panicked");
handle2.join().expect("Thread 2 panicked");
println!("\nFinal stack state:");
while let Some(val) = stack.pop() {
println!("Value: {}", val);
}
}
🧪 Testing the Implementation
Here are key tests to validate our implementation:
#[cfg(test)]
mod tests {
use super::*;
use std::sync::Arc;
use std::thread;
use std::time::Duration;
#[test]
fn test_basic_operations() {
let stack = LockFreeStack::new(false);
// Test push and pop
stack.push(1).expect("Push should succeed");
stack.push(2).expect("Push should succeed");
stack.push(3).expect("Push should succeed");
assert_eq!(stack.len(), 3);
assert_eq!(stack.pop(), Some(3));
assert_eq!(stack.pop(), Some(2));
assert_eq!(stack.pop(), Some(1));
assert_eq!(stack.pop(), None);
assert!(stack.is_empty());
}
#[test]
fn test_aba_prevention() {
let stack = Arc::new(LockFreeStack::new(false));
// Initial state: push two items
stack.push(1).expect("Push should succeed");
stack.push(2).expect("Push should succeed");
let stack_clone1 = Arc::clone(&stack);
let stack_clone2 = Arc::clone(&stack);
// Thread 1: Begin pop operation but pause in the middle
let handle1 = thread::spawn(move || {
// Begin pop operation and protect head
let head = stack_clone1.head.load(Ordering::Acquire);
stack_clone1.hazard_pointers.protect(head);
// Pause to allow Thread 2 to run
thread::sleep(Duration::from_millis(100));
// Try to complete the pop operation - should succeed safely
// even after Thread 2 causes an ABA situation
let result = stack_clone1.pop();
stack_clone1.hazard_pointers.clear_hazards();
result
});
// Thread 2: Perform operations while Thread 1 is paused
let handle2 = thread::spawn(move || {
thread::sleep(Duration::from_millis(50));
// Pop both values
let val1 = stack_clone2.pop().expect("First pop should succeed");
let val2 = stack_clone2.pop().expect("Second pop should succeed");
// Push them in reverse order (causing ABA)
stack_clone2.push(val1).expect("Push should succeed");
stack_clone2.push(val2).expect("Push should succeed");
});
// Wait for both threads to complete
let thread1_result = handle1.join().expect("Thread 1 panicked");
handle2.join().expect("Thread 2 panicked");
// Thread 1's pop should have succeeded without causing issues
assert!(thread1_result.is_some());
}
}
🏢 Practical Applications
Hazard pointers are particularly well-suited for several real-world scenarios:
1. Database Systems
In database engines, buffer pool managers use hazard pointers to protect pages:
struct BufferPoolManager {
page_table: HashMap<PageId, *mut Page>,
hazard_registry: Arc<HazardPointers<Page>>,
}
impl BufferPoolManager {
fn get_page(&self, page_id: PageId) -> Option<ProtectedPage> {
loop {
// Find the page pointer
let page_ptr = self.page_table.get(&page_id)?;
// Protect it before access
let protected = self.hazard_registry.protect(*page_ptr);
// Validate it's still the same page (key ABA prevention)
if unsafe { (*protected).id != page_id } {
continue; // Page was replaced, retry
}
return Some(ProtectedPage::new(protected, &self.hazard_registry));
}
}
fn evict_page(&mut self, page_id: PageId) {
if let Some(page_ptr) = self.page_table.remove(&page_id) {
self.hazard_registry.retire(page_ptr);
}
}
}
2. Network Packet Processing
High-performance network stacks benefit from hazard pointers for zero-copy packet handling:
struct NetworkProcessor {
rx_queue: LockFreeQueue<Packet>,
hazard_registry: Arc<HazardPointers<Packet>>,
}
impl NetworkProcessor {
fn process_packets(&self) {
while let Some(packet_ptr) = self.rx_queue.dequeue() {
// Protect packet while processing
let protected = self.hazard_registry.protect(packet_ptr);
self.process_single_packet(protected);
// Release protection and return to pool
self.hazard_registry.clear_hazards();
self.hazard_registry.retire(packet_ptr);
}
}
}
3. Task Scheduling Systems
Work-stealing schedulers like in Rayon use hazard pointers to safely access tasks:
struct WorkStealer {
queues: Vec<Arc<TaskDeque>>,
hazard_registry: Arc<HazardPointers<TaskBuffer>>,
}
impl WorkStealer {
fn steal_task(&self) -> Option<Task> {
// Pick a random victim queue
let victim = &self.queues[rand::random::<usize>() % self.queues.len()];
// Get the buffer pointer
let buf_ptr = victim.buffer.load(Ordering::Acquire);
if buf_ptr.is_null() {
return None;
}
// Protect and validate
let protected = self.hazard_registry.protect(buf_ptr);
if victim.buffer.load(Ordering::Acquire) != buf_ptr {
return None; // Buffer changed, abort
}
// Try to steal a task
let result = unsafe { (*protected).try_steal() };
self.hazard_registry.clear_hazards();
result
}
}
4. Real-world Examples
Several projects use hazard pointers in production:
- Facebook’s Folly Library: Implements hazard pointers for C++ services
- Junction: A concurrent data structure library with hazard pointer support
- libcds: Lock-free containers library with hazard pointer memory reclamation
đź”’ Safety Considerations
Hazard pointers require careful implementation for memory safety:
- The Critical Load-Protect-Validate Pattern
// REQUIRED PATTERN:
// 1. Load the pointer
let ptr = container.head.load(Ordering::Acquire);
// 2. Protect it
hazard_registry.protect(ptr);
// 3. Validate it's still valid (crucial for ABA prevention)
if container.head.load(Ordering::Acquire) != ptr {
// Validation failed - must retry from the beginning
continue;
}
// Now safe to dereference
let value = unsafe { (*ptr).value };
- Memory Ordering Requirements
// Acquire load ensures visibility of prior writes
let head = self.head.load(Ordering::Acquire);
// AcqRel for successful CAS ensures all memory operations are properly ordered
match self.head.compare_exchange(
current, next,
Ordering::AcqRel, // Success ordering
Ordering::Acquire, // Failure ordering
) {...}
- Safe Reclamation Process
// Get current hazard list atomically
let hazards = mutex_guard.get_all_hazard_pointers();
// Only free nodes not in any thread's hazard list
for ptr in retire_list {
if !hazards.contains(ptr) {
unsafe { Box::from_raw(ptr); }
} else {
still_hazardous.push(ptr); // Keep for later reclamation
}
}
The safety of hazard pointers relies on four key mechanisms:
- Synchronization: Atomic operations and mutex guards
- Load-Validate-Use Pattern: Crucial validation step after protecting a pointer
- Reclamation Scanning: Checking against all hazard pointers before freeing memory
- Memory Barriers: Appropriate ordering for atomic operations
When implemented correctly, hazard pointers provide:
- Protection against use-after-free bugs
- Prevention of ABA problems
- Deterministic memory reclamation
- Fine-grained control over memory protection
- Bounded memory usage
⚖️ Comparison with Arc
Rust’s standard library provides Arc<T>
for safe sharing between threads.
Here’s how it compares with hazard pointers:
Feature | Hazard Pointers | Arc |
---|---|---|
Memory Safety | Manual protection | Automatic reference counting |
Atomicity | Manually managed | Built-in |
Flexibility | High (custom protection) | Limited (whole object) |
Granularity | Per-pointer | Per-allocation |
Reclamation Timing | Deferred, batched | Immediate on last drop |
Example comparison:
// Using Arc<T> - simple but adds overhead to every access
let shared_data = Arc::new(Data { value: 42 });
let clone = Arc::clone(&shared_data);
thread::spawn(move || {
println!("Value: {}", clone.value);
// Arc automatically cleans up when last reference is dropped
});
// Using hazard pointers - more complex but potentially more efficient
let data_ptr = Box::into_raw(Box::new(Data { value: 42 }));
let registry = Arc::new(HazardPointers::new());
thread::spawn(move || {
// Protect before access (critical)
let protected = registry.protect(data_ptr);
println!("Value: {}", unsafe { (*protected).value });
// Must manually clean up
registry.clear_hazards();
registry.retire(data_ptr);
});
When to use each:
-
Use Arc
when: - Safety is your primary concern
- You want the simplest possible implementation
- Performance isn’t critical in your hot path
-
Use Hazard Pointers when:
- You need maximum performance for lock-free algorithms
- You want fine-grained control over memory protection
- You’re implementing complex concurrent data structures
⚠️ Common Pitfalls
When implementing hazard pointers, watch out for these common mistakes:
1. Missing Validation Step
// INCORRECT: Missing validation after protection
let head_ptr = stack.head.load(Ordering::Acquire);
hazard_registry.protect(head_ptr);
// Error: head_ptr might be invalid now!
let next = unsafe { (*head_ptr).next };
// CORRECT: With validation
let head_ptr = stack.head.load(Ordering::Acquire);
hazard_registry.protect(head_ptr);
if stack.head.load(Ordering::Acquire) != head_ptr {
continue; // Must retry
}
let next = unsafe { (*head_ptr).next };
2. Memory Ordering Bugs
// INCORRECT: Relaxed ordering doesn't guarantee visibility
let head_ptr = stack.head.load(Ordering::Relaxed);
// CORRECT: Acquire ordering ensures visibility
let head_ptr = stack.head.load(Ordering::Acquire);
3. Resource Leaks
// INCORRECT: Forgetting to clear hazards
fn process_item(ptr: *mut Item) {
hazard_registry.protect(ptr);
// Process item...
// Oops! Forgot to clear hazard and memory will never be reclaimed
}
// CORRECT: Always clear hazards when done
fn process_item(ptr: *mut Item) {
hazard_registry.protect(ptr);
// Process item...
hazard_registry.clear_hazards();
}
4. Debugging Tips
When debugging hazard pointer implementations, add debug-mode validation checks for hazard pointer operations and use logging to track pointer registrations and retirements. For complex issues, consider implementing a reference-counted shadow copy in debug mode to verify correctness.
đź”— Resources
- Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects (Research Paper)
- Demo project: hazard-pointers-demo
- Memory Models for Rust Concurrency
- Facebook’s Folly Library Hazard Pointers
🤔 Final Thoughts
Hazard pointers offer a powerful approach to solving the ABA problem with fine-grained control.While they require more complex implementation than epoch-based reclamation, they provide better control over memory protection and reclamation timing.
The three techniques we’ve explored in this series represent different trade-offs:
- Tagged pointers are simple but platform-dependent
- Epoch-based reclamation is easier to implement but has coarser protection granularity
- Hazard pointers provide fine-grained control but with more complex bookkeeping
By understanding these approaches, you can choose the most appropriate technique for your specific concurrent Rust application.