Re: erofs pointer corruption and kernel crash
From: Gao Xiang
Date: Fri Apr 10 2026 - 04:59:57 EST
On 2026/4/10 16:51, Arseniy Krasnov wrote:
10.04.2026 11:42, Gao Xiang wrote:
On 2026/4/10 16:31, Gao Xiang wrote:
Hi,
On 2026/4/10 16:13, Arseniy Krasnov wrote:
Hi,
We found unexpected behaviour of erofs:
There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
'struct folio' as first argument, and there is loop inside this function,
which updates 'private' field of provided folio:
do {
orig = atomic_read((atomic_t *)&folio->private);
DBG_BUGON(orig <= 0);
v = dirty << EROFS_ONLINEFOLIO_DIRTY;
v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
} while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
Now, we see that in some rare case, this function processes folio, where
'private' is pointer, and thus this loop will update some bits in this
pointer. Then later kernel dereferences such pointer and crashes.
To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 33cb0a7330d2..b1d8deffec4d 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
{
int orig, v;
+ if (((uintptr_t)folio->private) & 0xffff000000000000) {
No, if erofs_onlinefolio_end() is called, `folio->private`
shouldn't be a pointer, it's just a counter inside, and
storing a pointer is unexpected.
And since the folio is locked, it shouldn't call into
try_to_free_buffers().
Is it easy to reproduce? if yes, can you print other
values like `folio->mapping` and `folio->index` as
well?
I need more informations to find some clues.
btw, is that an unmodified upstream kernel "6.15.11-sdkernel"?
Currently I never heard Android phone vendors using 6.12 LTS
for example hit this. If it can easily reproduced, is it
possible to reproduce with the upstream kernel?
Yes, this is just upstream kernel, no vendor modifications. It is not android, just
buildroot.
I know, I mean for buildroot workloads, it should be
less pressure since it's just for embeded use.
And is the "0xffff000002b32468" pointer a valid pointer? what
does it point to? If it looks erofs pointer, the only one I
can think out is "struct z_erofs_pcluster", if it's not the
case, I think there should be other thing wrong if the kernel
is modified.
Yes, this is valid pointer, need to check about that pointer. I'll feedback here.
Anyway, if z_erofs_decompress_queue->erofs_onlinefolio_end()
is called:
- the folio should be locked, and folio->private should not
be a pointer;
- it seems `PG_Private` is set on the problematic folio
(otherwise try_to_free_buffers() won't be called), which
is unexpected too.
So what I need for some further analysis are:
- the folio structure (folio flags, mapping, index, count, etc.);
- what does folio->private point to?
Also is it possible I could get the memory dump if possible?
Not quite sure if it's possible in buildroot environment.
Thanks,
Gao Xiang
Thanks
Thanks,
Gao Xiang