erofs pointer corruption and kernel crash
From: Arseniy Krasnov
Date: Fri Apr 10 2026 - 04:18:49 EST
Hi,
We found unexpected behaviour of erofs:
There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
'struct folio' as first argument, and there is loop inside this function,
which updates 'private' field of provided folio:
do {
orig = atomic_read((atomic_t *)&folio->private);
DBG_BUGON(orig <= 0);
v = dirty << EROFS_ONLINEFOLIO_DIRTY;
v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
} while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
Now, we see that in some rare case, this function processes folio, where
'private' is pointer, and thus this loop will update some bits in this
pointer. Then later kernel dereferences such pointer and crashes.
To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 33cb0a7330d2..b1d8deffec4d 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
{
int orig, v;
+ if (((uintptr_t)folio->private) & 0xffff000000000000) {
+ pr_emerg("\n[foliodbg] %s:%d EROFS FOLIO %px PRIVATE BEFORE %px\n", __func__, __LINE__, folio, folio->private);
+ dump_stack();
+ }
+
do {
orig = atomic_read((atomic_t *)&folio->private);
DBG_BUGON(orig <= 0);
@@ -245,6 +250,9 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
} while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
+ if (((uintptr_t)folio->private) & 0xffff000000000000)
+ pr_emerg("\n[foliodbg] %s:%d EROFS FOLIO %px PRIVATE SET %px\n", __func__, __LINE__, folio, folio->private);
+
if (v & (BIT(EROFS_ONLINEFOLIO_DIRTY) - 1))
return;
folio->private = 0;
And it gives result:
[][ T639] [foliodbg] erofs_onlinefolio_end:242 EROFS FOLIO fffffdffc0030440 PRIVATE BEFORE ffff000002b32468
[][ T639] CPU: 0 UID: 0 PID: 639 Comm: kworker/0:6H Tainted: G O 6.15.11-sdkernel #1 PREEMPT
[][ T639] Tainted: [O]=OOT_MODULE
[][ T639] Workqueue: kverityd verity_work
[][ T639] Call trace:
[][ T639] show_stack+0x18/0x30 (C)
[][ T639] dump_stack_lvl+0x60/0x80
[][ T639] dump_stack+0x18/0x24
[][ T639] erofs_onlinefolio_end+0x124/0x130
[][ T639] z_erofs_decompress_queue+0x4b0/0x8c0
[][ T639] z_erofs_decompress_kickoff+0x88/0x150
[][ T639] z_erofs_endio+0x144/0x250
[][ T639] bio_endio+0x138/0x150
[][ T639] __dm_io_complete+0x1e0/0x2b0
[][ T639] clone_endio+0xd0/0x270
[][ T639] bio_endio+0x138/0x150
[][ T639] verity_finish_io+0x64/0xf0
[][ T639] verity_work+0x30/0x40
[][ T639] process_one_work+0x180/0x2e0
[][ T639] worker_thread+0x2c4/0x3f0
[][ T639] kthread+0x12c/0x210
[][ T639] ret_from_fork+0x10/0x20
[][ T639]
[][ T639] [foliodbg] erofs_onlinefolio_end:254 EROFS FOLIO fffffdffc0030440 PRIVATE SET ffff000022b32467
[][ T39] Unable to handle kernel paging request at virtual address ffff000022b32467
[][ T39] Mem abort info:
[][ T39] ESR = 0x0000000096000006
[][ T39] EC = 0x25: DABT (current EL), IL = 32 bits
[][ T39] SET = 0, FnV = 0
[][ T39] EA = 0, S1PTW = 0
[][ T39] FSC = 0x06: level 2 translation fault
[][ T39] Data abort info:
[][ T39] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[][ T39] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[][ T39] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[][ T39] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000001e36000
[][ T39] [ffff000022b32467] pgd=1800000007fff403, p4d=1800000007fff403, pud=1800000007ffe403, pmd=0000000000000000
[][ T39] Internal error: Oops: 0000000096000006 [#1] SMP
[][ T39] Modules linked in: vlsicomm(O)
[][ T39] CPU: 1 UID: 0 PID: 39 Comm: kswapd0 Tainted: G O 6.15.11-sdkernel #1 PREEMPT
[][ T39] Tainted: [O]=OOT_MODULE
[][ T39] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[][ T39] pc : drop_buffers.constprop.0+0x34/0x120
[][ T39] lr : try_to_free_buffers+0xd0/0x100
[][ T39] sp : ffff80008105b780
[][ T39] x29: ffff80008105b780 x28: 0000000000000000 x27: fffffdffc0030448
[][ T39] x26: ffff80008105b8a0 x25: ffff80008105b868 x24: 0000000000000001
[][ T39] x23: fffffdffc0030440 x22: ffff80008105b7b0 x21: fffffdffc0030440
[][ T39] x20: ffff000022b32467 x19: ffff000022b32467 x18: 0000000000000000
[][ T39] x17: 0000000000000000 x16: 0000000000000000 x15: 00000000d69f4cc0
[][ T39] x14: ffff0000000c5dc0 x13: 0000000000000000 x12: ffff800080d59b58
[][ T39] x11: 00000000000000c0 x10: 0000000000000000 x9 : 0000000000000000
[][ T39] x8 : ffff80008105b7d0 x7 : 0000000000000000 x6 : 000000000000003f
[][ T39] x5 : 0000000000000000 x4 : fffffdffc0030440 x3 : 1ff0000000004001
[][ T39] x2 : 1ff0000000004001 x1 : ffff80008105b7b0 x0 : fffffdffc0030440
[][ T39] Call trace:
[][ T39] drop_buffers.constprop.0+0x34/0x120 (P)
[][ T39] try_to_free_buffers+0xd0/0x100
[][ T39] filemap_release_folio+0x94/0xc0
[][ T39] shrink_folio_list+0x8c8/0xc40
[][ T39] shrink_lruvec+0x740/0xb80
[][ T39] shrink_node+0x2b8/0x9a0
[][ T39] balance_pgdat+0x3b8/0x760
[][ T39] kswapd+0x220/0x3b0
[][ T39] kthread+0x12c/0x210
[][ T39] ret_from_fork+0x10/0x20
[][ T39] Code: 14000004 f9400673 eb13029f 54000180 (f9400262)
[][ T39] ---[ end trace 0000000000000000 ]---
[][ T39] Kernel panic - not syncing: Oops: Fatal exception
[][ T39] SMP: stopping secondary CPUs
[][ T39] Kernel Offset: disabled
[][ T39] CPU features: 0x0000,00000000,01000000,0200420b
[][ T39] Memory Limit: none
[][ T39] Rebooting in 5 seconds..
So 'erofs_onlinefolio_end()' takes some folio with 'private' field contains
some pointer (0xffff000002b32468), "corrupts" this pointer (result will be
0xffff000022b32467 - at least we see that 0x20000000 was ORed to original
pointer and this is (1 << EROFS_ONLINEFOLIO_DIRTY)), and then kernel crashes.
We guess it is not valid case when such folio is passed as argument to
'erofs_onlinefolio_end()'.
We have the following erofs configuration in buildroot:
BR2_TARGET_ROOTFS_EROFS=y
BR2_TARGET_ROOTFS_EROFS_CUSTOM_COMPRESSION=y
BR2_TARGET_ROOTFS_EROFS_COMPRESSION_ALGORITHMS="zstd,22 --max-extent-bytes 65536 -E48bit"
BR2_TARGET_ROOTFS_EROFS_FRAGMENTS=y
BR2_TARGET_ROOTFS_EROFS_PCLUSTERSIZE=65536
May be You know how to fix it or some ideas? Because we are new at erofs and need to discover and
learn its source code.
Thanks