[BUG] ext4: KCSAN: lockless i_es_all_nr reads in es_shrinker_info

From: Shuhao Fu

Date: Thu Apr 23 2026 - 13:38:11 EST


Hi,

Reading /proc/fs/ext4/<sb>/es_shrinker_info can overlap with extent-status
updates and trigger KCSAN reports on the per-inode ES counters (I saw this on
i_es_all_nr; i_es_shk_nr is read the same way in this proc path). From what I
can see, the user-visible impact appears limited to stale/inconsistent procfs
stats output (I do not have evidence of corruption or crash from this path).

I reproduced this on a local KCSAN-instrumented tree based on linux commit
d8a9a4b11a13, using an x86_64 QEMU workload with userspace reader/writer loops.
To increase the race window, I added small debug-only hooks in my local tree:
after the writer updates the counter, it briefly delays and records which inode
it just touched; the proc reader then samples that inode's counters during the
s_es_list walk. I also wrapped the i_es_all_nr load in a local helper
ext4_es_shrinker_read_all_nr() so the read-side stack has a stable symbol;
upstream reads happen directly in ext4_seq_es_shrinker_info_show().

With that setup, KCSAN prints the following summary line (naming the two
racing functions):

BUG: KCSAN: data-race in ext4_es_init_extent / ext4_es_shrinker_read_all_nr

The first clean hit in my local log was:

read to 0xffff917cc15222c8 of 4 bytes by task 107 on cpu 0:
ext4_es_shrinker_read_all_nr+0x26/0x50
ext4_es_kcsan_probe_hot_inode+0x2b9/0x400
ext4_seq_es_shrinker_info_show+0x9b/0xd40
...
__x64_sys_sendfile64+0xc2/0x100
do_syscall_64+0x13f/0x3c0

write (reordered) to 0xffff917cc15222c8 of 4 bytes by task 108 on cpu 2:
ext4_es_init_extent+0x6aa/0xa00
__es_insert_extent+0x477/0xaa0
...
ext4_do_fallocate+0x127/0x310
__x64_sys_fallocate+0x75/0xb0

I then saw the same pair again later in the same run (for example around
129.529391 and 129.579938), still on the same 4-byte address.

It looks like i_es_all_nr and i_es_shk_nr are documented as protected by
i_es_lock, and writers update them under i_es_lock, but
ext4_seq_es_shrinker_info_show() reads them while walking the list under
s_es_lock (the list lock), not i_es_lock.

The reproducer shape from normal userspace APIs is one reader loop running
cat /proc/fs/ext4/<sb>/es_shrinker_info while a writer loop runs fallocate,
buffered writes, punch-hole, and truncate on the same filesystem.

Since this appears to be an observational procfs stats path, would you prefer
marking these loads with data_race(...) so the intentionally approximate reads
are explicit and this path stops generating repeated KCSAN warnings?

The rough change I had in mind is:

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index ... .. ...
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@
int ext4_seq_es_shrinker_info_show(struct seq_file *seq, void *v)
{
...
list_for_each_entry(ei, &sbi->s_es_list, i_es_list) {
inode_cnt++;
ei_all_nr = data_race(ei->i_es_all_nr);
ei_shk_nr = data_race(ei->i_es_shk_nr);
...
}

If this direction is preferred, I can send a formal patch.

Thanks,
Shuhao