KCSAN: data-race in mas_topiary_replace / mtree_range_walk
From: Jianzhou Zhao
Date: Wed Mar 11 2026 - 04:20:11 EST
Subject: [BUG] lib/maple_tree: KCSAN: data-race in mas_topiary_replace / mtree_range_walk
Dear Maintainers,
We are writing to report a KCSAN-detected data race vulnerability within `lib/maple_tree.c`. This bug was found by our custom fuzzing tool, RacePilot. The race occurs during the node lifecycle management in the maple tree when `mas_topiary_replace()` (via `mte_set_node_dead()`) marks a block as dead by modifying `node->parent` using a plain write, while a concurrent RCU-protected walker in `mtree_range_walk()` evaluates node death via `ma_dead_node()` performing an unannotated read on the exact same `node->parent` variable. We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.
Call Trace & Context
==================================================================
BUG: KCSAN: data-race in mas_topiary_replace / mtree_range_walk
write to 0xffff88800c9f3e00 of 8 bytes by task 43414 on cpu 1:
mte_set_node_dead home/kfuzz/linux/lib/maple_tree.c:335 [inline]
mas_put_in_tree home/kfuzz/linux/lib/maple_tree.c:1571 [inline]
mas_topiary_replace+0x14e/0x14a0 home/kfuzz/linux/lib/maple_tree.c:2350
mas_wmb_replace home/kfuzz/linux/lib/maple_tree.c:2443 [inline]
mas_split home/kfuzz/linux/lib/maple_tree.c:3067 [inline]
mas_commit_b_node home/kfuzz/linux/lib/maple_tree.c:3087 [inline]
mas_wr_bnode+0xd2a/0x23b0 home/kfuzz/linux/lib/maple_tree.c:3755
mas_wr_store_entry+0x77b/0x1120 home/kfuzz/linux/lib/maple_tree.c:3787
mas_store_prealloc+0x47c/0xa60 home/kfuzz/linux/lib/maple_tree.c:5191
...
__x64_sys_mmap+0x71/0xa0 home/kfuzz/linux/arch/x86/kernel/sys_x86_64.c:82
read to 0xffff88800c9f3e00 of 8 bytes by task 43413 on cpu 0:
ma_dead_node home/kfuzz/linux/lib/maple_tree.c:576 [inline]
mtree_range_walk+0x11e/0x630 home/kfuzz/linux/lib/maple_tree.c:2594
mas_state_walk home/kfuzz/linux/lib/maple_tree.c:3313 [inline]
mas_walk+0x2a4/0x400 home/kfuzz/linux/lib/maple_tree.c:4617
lock_vma_under_rcu+0xd3/0x710 home/kfuzz/linux/mm/mmap_lock.c:238
do_user_addr_fault home/kfuzz/linux/arch/x86/mm/fault.c:1327 [inline]
handle_page_fault home/kfuzz/linux/arch/x86/mm/fault.c:1476 [inline]
exc_page_fault+0x294/0x10d0 home/kfuzz/linux/arch/x86/mm/fault.c:1532
...
value changed: 0xffff8880330fec41 -> 0xffff88800c9f3e00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 43413 Comm: syz.0.3576 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================
Execution Flow & Code Context
The race involves marking maple tree nodes as obsolete against concurrent RCU readers. During VMA map modifications (like `sys_mmap`), writers invoke `mas_wr_store_entry()` mapping modifications that frequently necessitate tree node splitting. A split triggers replacing obsolete topiary structures in `mas_topiary_replace()`, which terminates old nodes by pointing their parent links cyclically to themselves in `mte_set_node_dead()`:
```c
// lib/maple_tree.c
static inline void mte_set_node_dead(struct maple_enode *mn)
{
mte_to_node(mn)->parent = ma_parent_ptr(mte_to_node(mn)); // <-- Plain concurrent write
smp_wmb(); /* Needed for RCU */
}
```
Meanwhile, a concurrent page fault (`do_user_addr_fault`) can acquire the read-lock locklessly using `lock_vma_under_rcu`, walking the maple tree states via `mtree_range_walk()`. Within this walk, readers validate the nodes they jump onto aren't dead by inspecting the cyclic relationship through `ma_dead_node()`:
```c
// lib/maple_tree.c
static __always_inline bool ma_dead_node(const struct maple_node *node)
{
struct maple_node *parent;
/* Do not reorder reads from the node prior to the parent check */
smp_rmb();
parent = (void *)((unsigned long)node->parent & ~MAPLE_NODE_MASK); // <-- Plain concurrent read
return (parent == node);
}
```
Root Cause Analysis
A KCSAN data race arises because one thread assigns the node's topological state pointer `node->parent` during topiary teardowns while another lockless RCU thread inspects `node->parent` synchronously but without memory safety compiler directives (`READ_ONCE`/`WRITE_ONCE`). Since the reader depends on `smp_rmb()` to coordinate the access sequence but fails to assert volatile instructions strictly, the C compiler maintains the leeway to perform optimizations or load tearing on the pointer assignment.
Unfortunately, we were unable to generate a reproducer for this bug.
Potential Impact
This data race technically compromises the RCU safety protocol on certain aggressive compiler topologies. Read tearing or asynchronous propagation over the parent cyclic relationship check during `lock_vma_under_rcu` could yield unhandled page faults bypassing tree structure mapping lookups. This translates to transient denial-of-service conditions mapping virtual memory pages under high threading contention.
Proposed Fix
To correct this concurrency violation and adhere strictly to the Linux Kernel Memory Model, `WRITE_ONCE` must be used in `mte_set_node_dead()` when terminating the parental link, and conversely, `READ_ONCE` in `ma_dead_node()` must capture the pointer snapshot safely against racing cyclic mutations.
```diff
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -332,7 +332,7 @@ static inline struct maple_node *mas_mn(const struct ma_state *mas)
static inline void mte_set_node_dead(struct maple_enode *mn)
{
- mte_to_node(mn)->parent = ma_parent_ptr(mte_to_node(mn));
+ WRITE_ONCE(mte_to_node(mn)->parent, ma_parent_ptr(mte_to_node(mn)));
smp_wmb(); /* Needed for RCU */
}
@@ -579,7 +579,7 @@ static __always_inline bool ma_dead_node(const struct maple_node *node)
{
struct maple_node *parent;
/* Do not reorder reads from the node prior to the parent check */
smp_rmb();
- parent = (void *)((unsigned long)node->parent & ~MAPLE_NODE_MASK);
+ parent = (void *)((unsigned long)READ_ONCE(node->parent) & ~MAPLE_NODE_MASK);
return (parent == node);
}
```
We would be highly honored if this could be of any help.
Best regards,
RacePilot Team