Re: erofs pointer corruption and kernel crash

From: Arseniy Krasnov

Date: Fri Apr 10 2026 - 04:51:37 EST




10.04.2026 11:42, Gao Xiang wrote:
>
>
> On 2026/4/10 16:31, Gao Xiang wrote:
>> Hi,
>>
>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>> Hi,
>>>
>>> We found unexpected behaviour of erofs:
>>>
>>> There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
>>> 'struct folio' as first argument, and there is loop inside this function,
>>> which updates 'private' field of provided folio:
>>>
>>>    do {
>>>            orig = atomic_read((atomic_t *)&folio->private);
>>>            DBG_BUGON(orig <= 0);
>>>            v = dirty << EROFS_ONLINEFOLIO_DIRTY;
>>>            v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>>>    } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>>
>>> Now, we see that in some rare case, this function processes folio, where
>>> 'private' is pointer, and thus this loop will update some bits in this
>>> pointer. Then later kernel dereferences such pointer and crashes.
>>>
>>> To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
>>>
>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>> index 33cb0a7330d2..b1d8deffec4d 100644
>>> --- a/fs/erofs/data.c
>>> +++ b/fs/erofs/data.c
>>> @@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>>>   {
>>>       int orig, v;
>>> +    if (((uintptr_t)folio->private) & 0xffff000000000000) {
>>
>> No, if erofs_onlinefolio_end() is called, `folio->private`
>> shouldn't be a pointer, it's just a counter inside, and
>> storing a pointer is unexpected.
>>
>> And since the folio is locked, it shouldn't call into
>> try_to_free_buffers().
>>
>> Is it easy to reproduce? if yes, can you print other
>> values like `folio->mapping` and `folio->index` as
>> well?
>>
>> I need more informations to find some clues.
>
> btw, is that an unmodified upstream kernel "6.15.11-sdkernel"?
> Currently I never heard Android phone vendors using 6.12 LTS
> for example hit this. If it can easily reproduced, is it
> possible to reproduce with the upstream kernel?

Yes, this is just upstream kernel, no vendor modifications. It is not android, just
buildroot.

>
> And is the "0xffff000002b32468" pointer a valid pointer? what
> does it point to? If it looks erofs pointer, the only one I
> can think out is "struct z_erofs_pcluster", if it's not the
> case, I think there should be other thing wrong if the kernel
> is modified.

Yes, this is valid pointer, need to check about that pointer. I'll feedback here.

Thanks

>
>>
>> Thanks,
>> Gao Xiang
>>
>>> +        pr_emerg("\n[foliodbg] %s:%d EROFS FOLIO %px PRIVATE BEFORE %px\n", __func__, __LINE__, folio, folio->private);
>>> +        dump_stack();
>>> +    }
>>> +
>>>       do {
>>>           orig = atomic_read((atomic_t *)&folio->private);
>>>           DBG_BUGON(orig <= 0);
>>> @@ -245,6 +250,9 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>>>           v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>>>       } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>> +    if (((uintptr_t)folio->private) & 0xffff000000000000)
>>> +        pr_emerg("\n[foliodbg] %s:%d EROFS FOLIO %px PRIVATE SET %px\n", __func__, __LINE__, folio, folio->private);
>>> +
>>>       if (v & (BIT(EROFS_ONLINEFOLIO_DIRTY) - 1))
>>>           return;
>>>       folio->private = 0;
>>>
>>>
>>> And it gives result:
>>>
>>> [][  T639] [foliodbg] erofs_onlinefolio_end:242 EROFS FOLIO fffffdffc0030440 PRIVATE BEFORE ffff000002b32468
>>> [][  T639] CPU: 0 UID: 0 PID: 639 Comm: kworker/0:6H Tainted: G O 6.15.11-sdkernel #1 PREEMPT
>>> [][  T639] Tainted: [O]=OOT_MODULE
>>> [][  T639] Workqueue: kverityd verity_work
>>> [][  T639] Call trace:
>>> [][  T639]  show_stack+0x18/0x30 (C)
>>> [][  T639]  dump_stack_lvl+0x60/0x80
>>> [][  T639]  dump_stack+0x18/0x24
>>> [][  T639]  erofs_onlinefolio_end+0x124/0x130
>>> [][  T639]  z_erofs_decompress_queue+0x4b0/0x8c0
>>> [][  T639]  z_erofs_decompress_kickoff+0x88/0x150
>>> [][  T639]  z_erofs_endio+0x144/0x250
>>> [][  T639]  bio_endio+0x138/0x150
>>> [][  T639]  __dm_io_complete+0x1e0/0x2b0
>>> [][  T639]  clone_endio+0xd0/0x270
>>> [][  T639]  bio_endio+0x138/0x150
>>> [][  T639]  verity_finish_io+0x64/0xf0
>>> [][  T639]  verity_work+0x30/0x40
>>> [][  T639]  process_one_work+0x180/0x2e0
>>> [][  T639]  worker_thread+0x2c4/0x3f0
>>> [][  T639]  kthread+0x12c/0x210
>>> [][  T639]  ret_from_fork+0x10/0x20
>>> [][  T639]
>>> [][  T639] [foliodbg] erofs_onlinefolio_end:254 EROFS FOLIO fffffdffc0030440 PRIVATE SET ffff000022b32467
>>> [][   T39] Unable to handle kernel paging request at virtual address ffff000022b32467
>>> [][   T39] Mem abort info:
>>> [][   T39]   ESR = 0x0000000096000006
>>> [][   T39]   EC = 0x25: DABT (current EL), IL = 32 bits
>>> [][   T39]   SET = 0, FnV = 0
>>> [][   T39]   EA = 0, S1PTW = 0
>>> [][   T39]   FSC = 0x06: level 2 translation fault
>>> [][   T39] Data abort info:
>>> [][   T39]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
>>> [][   T39]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>> [][   T39]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>> [][   T39] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000001e36000
>>> [][   T39] [ffff000022b32467] pgd=1800000007fff403, p4d=1800000007fff403, pud=1800000007ffe403, pmd=0000000000000000
>>> [][   T39] Internal error: Oops: 0000000096000006 [#1]  SMP
>>> [][   T39] Modules linked in: vlsicomm(O)
>>> [][   T39] CPU: 1 UID: 0 PID: 39 Comm: kswapd0 Tainted: G O 6.15.11-sdkernel #1 PREEMPT
>>> [][   T39] Tainted: [O]=OOT_MODULE
>>> [][   T39] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [][   T39] pc : drop_buffers.constprop.0+0x34/0x120
>>> [][   T39] lr : try_to_free_buffers+0xd0/0x100
>>> [][   T39] sp : ffff80008105b780
>>> [][   T39] x29: ffff80008105b780 x28: 0000000000000000 x27: fffffdffc0030448
>>> [][   T39] x26: ffff80008105b8a0 x25: ffff80008105b868 x24: 0000000000000001
>>> [][   T39] x23: fffffdffc0030440 x22: ffff80008105b7b0 x21: fffffdffc0030440
>>> [][   T39] x20: ffff000022b32467 x19: ffff000022b32467 x18: 0000000000000000
>>> [][   T39] x17: 0000000000000000 x16: 0000000000000000 x15: 00000000d69f4cc0
>>> [][   T39] x14: ffff0000000c5dc0 x13: 0000000000000000 x12: ffff800080d59b58
>>> [][   T39] x11: 00000000000000c0 x10: 0000000000000000 x9 : 0000000000000000
>>> [][   T39] x8 : ffff80008105b7d0 x7 : 0000000000000000 x6 : 000000000000003f
>>> [][   T39] x5 : 0000000000000000 x4 : fffffdffc0030440 x3 : 1ff0000000004001
>>> [][   T39] x2 : 1ff0000000004001 x1 : ffff80008105b7b0 x0 : fffffdffc0030440
>>> [][   T39] Call trace:
>>> [][   T39]  drop_buffers.constprop.0+0x34/0x120 (P)
>>> [][   T39]  try_to_free_buffers+0xd0/0x100
>>> [][   T39]  filemap_release_folio+0x94/0xc0
>>> [][   T39]  shrink_folio_list+0x8c8/0xc40
>>> [][   T39]  shrink_lruvec+0x740/0xb80
>>> [][   T39]  shrink_node+0x2b8/0x9a0
>>> [][   T39]  balance_pgdat+0x3b8/0x760
>>> [][   T39]  kswapd+0x220/0x3b0
>>> [][   T39]  kthread+0x12c/0x210
>>> [][   T39]  ret_from_fork+0x10/0x20
>>> [][   T39] Code: 14000004 f9400673 eb13029f 54000180 (f9400262)
>>> [][   T39] ---[ end trace 0000000000000000 ]---
>>> [][   T39] Kernel panic - not syncing: Oops: Fatal exception
>>> [][   T39] SMP: stopping secondary CPUs
>>> [][   T39] Kernel Offset: disabled
>>> [][   T39] CPU features: 0x0000,00000000,01000000,0200420b
>>> [][   T39] Memory Limit: none
>>> [][   T39] Rebooting in 5 seconds..
>>>
>>> So 'erofs_onlinefolio_end()' takes some folio with 'private' field contains
>>> some pointer (0xffff000002b32468), "corrupts" this pointer (result will be
>>> 0xffff000022b32467 - at least we see that 0x20000000 was ORed to original
>>> pointer and this is (1 << EROFS_ONLINEFOLIO_DIRTY)), and then kernel crashes.
>>> We guess it is not valid case when such folio is passed as argument to
>>> 'erofs_onlinefolio_end()'.
>>>
>>> We have the following erofs configuration in buildroot:
>>>
>>> BR2_TARGET_ROOTFS_EROFS=y
>>> BR2_TARGET_ROOTFS_EROFS_CUSTOM_COMPRESSION=y
>>> BR2_TARGET_ROOTFS_EROFS_COMPRESSION_ALGORITHMS="zstd,22 --max-extent-bytes 65536 -E48bit"
>>> BR2_TARGET_ROOTFS_EROFS_FRAGMENTS=y
>>> BR2_TARGET_ROOTFS_EROFS_PCLUSTERSIZE=65536
>>>
>>>
>>>
>>> May be You know how to fix it or some ideas? Because we are new at erofs and need to discover and
>>> learn its source code.
>>>
>>> Thanks
>>
>