Re: erofs pointer corruption and kernel crash
From: Arseniy Krasnov
Date: Sat Apr 11 2026 - 11:15:20 EST
10.04.2026 18:41, Gao Xiang пишет:
> Hi Arseniy,
>
> On 2026/4/10 21:27, Arseniy Krasnov wrote:
>>
>>
>> 10.04.2026 15:20, Gao Xiang пишет:
>>>
>>>
>>> On 2026/4/10 19:37, Arseniy Krasnov wrote:
>>>
>>> (drop unrelated folks since they all subscribed erofs mailing list)
>>>
>>>>
>>>>
>>>> 10.04.2026 11:31, Gao Xiang wrote:
>>>>> Hi,
>>>>>
>>>>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> We found unexpected behaviour of erofs:
>>>>>>
>>>>>> There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
>>>>>> 'struct folio' as first argument, and there is loop inside this function,
>>>>>> which updates 'private' field of provided folio:
>>>>>>
>>>>>> do {
>>>>>> orig = atomic_read((atomic_t *)&folio->private);
>>>>>> DBG_BUGON(orig <= 0);
>>>>>> v = dirty << EROFS_ONLINEFOLIO_DIRTY;
>>>>>> v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>>>>>> } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>>>>>
>>>>>> Now, we see that in some rare case, this function processes folio, where
>>>>>> 'private' is pointer, and thus this loop will update some bits in this
>>>>>> pointer. Then later kernel dereferences such pointer and crashes.
>>>>>>
>>>>>> To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
>>>>>>
>>>>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>>>>> index 33cb0a7330d2..b1d8deffec4d 100644
>>>>>> --- a/fs/erofs/data.c
>>>>>> +++ b/fs/erofs/data.c
>>>>>> @@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>>>>>> {
>>>>>> int orig, v;
>>>>>> + if (((uintptr_t)folio->private) & 0xffff000000000000) {
>>>>>
>>>>> No, if erofs_onlinefolio_end() is called, `folio->private`
>>>>> shouldn't be a pointer, it's just a counter inside, and
>>>>> storing a pointer is unexpected.
>>>>>
>>>>> And since the folio is locked, it shouldn't call into
>>>>> try_to_free_buffers().
>>>>>
>>>>> Is it easy to reproduce? if yes, can you print other
>>>>> values like `folio->mapping` and `folio->index` as
>>>>> well?
>>>>>
>>>>> I need more informations to find some clues.
>>>>
>>>>
>>>>
>>>> So reproduced again with this debug patch which adds magic to 'struct z_erofs_pcluster' and prints 'struct folio'
>>>> when pointer in 'private' is passed to 'erofs_onlinefolio_end()'. In short - 'private' points to 'struct z_erofs_pcluster'.
>>> First, erofs-utils 1.8.10 doesn't support `-E48bit`:
>>> only erofs-utils 1.9+ ship it as an experimental
>>> feature, see Changelog; so I think you're using
>>> modified erofs-utils 1.8.10:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/ChangeLog
>>>
>>> ```
>>> erofs-utils 1.9
>>>
>>> * This release includes the following updates:
>>> - Add 48-bit layout support for larger filesystems (EXPERIMENTAL);
>>> ```
>>>
>>> Second, I'm pretty sure this issue is related to
>>> experimenal `-E48bit`, and those information is
>>> not enough for me to find the root cause, so I
>>> need to find a way to reproduce myself: It may
>>> take time; you could debug yourself but I don't
>>> think it's an easy task if you don't quite familiar
>>> with the EROFS codebase.
>>>
>>> Anyway I really suggest if you need a rush solution
>>> for production, don't use `-E48bit + zstd` like
>>> this for now: try to use other options like
>>> `-zzstd -C65536 -Efragments` instead since those
>>> are common production choices.
>>
>> Ok thanks for this advice! One more question: currently we use this options:
>> "zstd,22 --max-extent-bytes 65536 -E48bit". Ok we remove "zstd,22" and "E48bit",
>> but what about "--max-extent-bytes 65536" - is it considered stable option?
>> Or it is better to use your version: "-zzstd -C65536 -Efragments" ?
>
> I'm not sure how you find this
> "zstd,22 --max-extent-bytes 65536 -E48bit" combination.
>
> My suggestion based on production is that as long as
> you don't use `-zzstd` ++ `-E48bit`, it should be fine.
>
> If you need smaller images, I suggest: `-zlzma,9 -C65536 -Efragments`
> Or like Android, they all use `-zlz4hc`,
> Or zstd, but don't add `-E48bit`.
>
> As for "--max-extent-bytes 65536", it can be dropped
> since if `-E48bit` is not used, it only has negative
> impacts.
>
> In short, `-E48bit` + `-zzstd` + `--max-extent-bytes`
> enables new unaligned compression for zstd, but it's
> a relatively new feature, I still still some time to
> stablize it but my own time is limited and all things
> are always prioritized.
Ok, thanks for this advice!
Thanks
>
> Thanks,
> Gao Xiang
>
>>
>> Thanks
>>
>>>
>>> Thanks,
>>> Gao Xiang
>