Re: erofs pointer corruption and kernel crash

From: Arseniy Krasnov

Date: Fri Apr 10 2026 - 09:41:00 EST




10.04.2026 15:20, Gao Xiang пишет:
>
>
> On 2026/4/10 19:37, Arseniy Krasnov wrote:
>
> (drop unrelated folks since they all subscribed erofs mailing list)
>
>>
>>
>> 10.04.2026 11:31, Gao Xiang wrote:
>>> Hi,
>>>
>>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>>> Hi,
>>>>
>>>> We found unexpected behaviour of erofs:
>>>>
>>>> There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
>>>> 'struct folio' as first argument, and there is loop inside this function,
>>>> which updates 'private' field of provided folio:
>>>>
>>>>     do {
>>>>             orig = atomic_read((atomic_t *)&folio->private);
>>>>             DBG_BUGON(orig <= 0);
>>>>             v = dirty << EROFS_ONLINEFOLIO_DIRTY;
>>>>             v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>>>>     } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>>>
>>>> Now, we see that in some rare case, this function processes folio, where
>>>> 'private' is pointer, and thus this loop will update some bits in this
>>>> pointer. Then later kernel dereferences such pointer and crashes.
>>>>
>>>> To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
>>>>
>>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>>> index 33cb0a7330d2..b1d8deffec4d 100644
>>>> --- a/fs/erofs/data.c
>>>> +++ b/fs/erofs/data.c
>>>> @@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>>>>    {
>>>>        int orig, v;
>>>>    +    if (((uintptr_t)folio->private) & 0xffff000000000000) {
>>>
>>> No, if erofs_onlinefolio_end() is called, `folio->private`
>>> shouldn't be a pointer, it's just a counter inside, and
>>> storing a pointer is unexpected.
>>>
>>> And since the folio is locked, it shouldn't call into
>>> try_to_free_buffers().
>>>
>>> Is it easy to reproduce? if yes, can you print other
>>> values like `folio->mapping` and `folio->index` as
>>> well?
>>>
>>> I need more informations to find some clues.
>>
>>
>>
>> So reproduced again with this debug patch which adds magic to 'struct z_erofs_pcluster' and prints 'struct folio'
>> when pointer in 'private' is passed to 'erofs_onlinefolio_end()'. In short - 'private' points to 'struct z_erofs_pcluster'.
> First, erofs-utils 1.8.10 doesn't support `-E48bit`:
> only erofs-utils 1.9+ ship it as an experimental
> feature, see Changelog; so I think you're using
> modified erofs-utils 1.8.10:
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/ChangeLog
>
> ```
> erofs-utils 1.9
>
>  * This release includes the following updates:
>    - Add 48-bit layout support for larger filesystems (EXPERIMENTAL);
> ```
>
> Second, I'm pretty sure this issue is related to
> experimenal `-E48bit`, and those information is
> not enough for me to find the root cause, so I
> need to find a way to reproduce myself: It may
> take time; you could debug yourself but I don't
> think it's an easy task if you don't quite familiar
> with the EROFS codebase.

Also some more information just catched with CONFIG_EROFS_FS_DEBUG. Same problem, but enabled
debug logic BUGed kernel earlier. May be useful for You.

Thanks


[ 368.587000][ T608] ------------[ cut here ]------------
[ 368.587079][ T608] kernel BUG at fs/erofs/zdata.c:1606!
[ 368.591977][ T608] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 368.593622][ T1214] ------------[ cut here ]------------
[ 368.598779][ T608] Modules linked in: vlsicomm(O)
[ 368.604040][ T1214] kernel BUG at fs/erofs/zdata.c:1606!
[ 368.608787][ T608] CPU: 1 UID: 0 PID: 608 Comm: kworker/1:3H Tainted: G O 6.15.11-sdkernel #1 PREEMPT
[ 368.624876][ T608] Tainted: [O]=OOT_MODULE
[ 368.635015][ T608] Workqueue: kverityd verity_work
[ 368.639844][ T608] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 368.647428][ T608] pc : z_erofs_endio+0x220/0x270
[ 368.652172][ T608] lr : z_erofs_endio+0x23c/0x270
[ 368.656920][ T608] sp : ffff80008215bbe0
[ 368.660887][ T608] x29: ffff80008215bbe0 x28: ffff0000032feb40 x27: ffff0000032feb80
[ 368.668646][ T608] x26: fffffdffc0029280 x25: 0000000000000009 x24: ffff000000be17e0
[ 368.676408][ T608] x23: ffff000007e85c00 x22: 0000000000001000 x21: 0000000000001000
[ 368.684170][ T608] x20: 0000000000000000 x19: 0000000000001000 x18: 00000000e6fb12fd
[ 368.691933][ T608] x17: 00000000c98c11f0 x16: 00000000ac7e39e2 x15: 00000000c3362985
[ 368.699696][ T608] x14: 0000000001040820 x13: 00000000a3bddb58 x12: ffff80008215bb68
[ 368.707458][ T608] x11: 0000000049a63821 x10: ffff8000809febe0 x9 : 0000000000000000
[ 368.715221][ T608] x8 : ffff000003cee8e8 x7 : 0000000000000000 x6 : 459ea227f0118cc9
[ 368.722983][ T608] x5 : 0000000000000000 x4 : 1ff0000000004021 x3 : 0000000000000000
[ 368.730746][ T608] x2 : 0000000000000000 x1 : ffff0000029f3e00 x0 : fffffdffc0029240
[ 368.738513][ T608] Call trace:
[ 368.741619][ T608] z_erofs_endio+0x220/0x270 (P)
[ 368.746362][ T608] bio_endio+0x138/0x150
[ 368.750411][ T608] __dm_io_complete+0x1e0/0x2b0
[ 368.755068][ T608] clone_endio+0xd0/0x270
[ 368.759213][ T608] bio_endio+0x138/0x150
[ 368.763262][ T608] verity_finish_io+0x64/0xf0
[ 368.767747][ T608] verity_work+0x30/0x40
[ 368.771800][ T608] process_one_work+0x180/0x2e0
[ 368.776463][ T608] worker_thread+0x2c4/0x3f0
[ 368.780862][ T608] kthread+0x12c/0x210
[ 368.784742][ T608] ret_from_fork+0x10/0x20
[ 368.788979][ T608] Code: 17ffffc8 f9401401 b100103f 54fff5a0 (d4210000)
[ 368.795698][ T608] ---[ end trace 0000000000000000 ]---
[ 368.813672][ T608] Kernel panic - not syncing: Oops - BUG: Fatal exception
[ 368.815015][ T608] SMP: stopping secondary CPUs
[ 369.896670][ T608] SMP: failed to stop secondary CPUs 0
[ 369.896729][ T608] Kernel Offset: disabled
[ 369.900508][ T608] CPU features: 0x0000,00000000,01000000,0200420b
[ 369.906718][ T608] Memory Limit: none
[ 369.922397][ T608] Rebooting in 5 seconds..



>
> Anyway I really suggest if you need a rush solution
> for production, don't use `-E48bit + zstd` like
> this for now: try to use other options like
> `-zzstd -C65536 -Efragments` instead since those
> are common production choices.
>
> Thanks,
> Gao Xiang