Re: [BUG] Memory ordering between kmalloc() and kfree()? it's confusing!
From: Harry Yoo
Date: Thu Mar 05 2026 - 21:48:50 EST
On Thu, Feb 26, 2026 at 03:35:08PM +0900, Harry Yoo wrote:
> Hello, SLAB, LKMM, and KCSAN folks!
[...snip...]
> # Now, let's take a look at the bug I've been investigating
>
> There were two bugs [3] [4] reported, with symptoms that appear to be
> caused by slab returning wrong metadata (the symptoms: incorrect
> reference counting of obj_cgroup, integer overflow as more memory is
> uncharged than charged).
>
> [3] https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@xxxxxxxxxxxxx
> [4] https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@xxxxxxxxxxxxx
>
> Hmm, if it's returning wrong metadata, how could that happen?
>
> Well, perhaps it's either 1) the calculation of metadata address is
> incorrect, or 2) reading the metadata itself is racy.
>
> Shakeel Butt pointed out [9] that there's a potential memory ordering
> issue. It suggests that no enforced ordering between slab->obj_exts
> and slab->stride can make the metadata address calculation incorrect.
>
> [9] https://lore.kernel.org/lkml/aZu9G9mVIVzSm6Ft@hyeyoo
>
> Let's say CPU X and Y are allocating/freeing slab objects from/to
> the same slab. They need to access metadata for the objects:
>
> CPU X CPU Y
>
> // CPU X allocates metadata array
> - slab->obj_exts = <the address of the metadata array>
> - slab->stride = 16 (sizeof struct slab)
>
> - stride = plain load slab->stride
> - obj_exts = READ_ONCE(slab->obj_exts)
> - if (obj_exts)
> - metadata_addr =
> stride * index + obj_exts
> - stride = plain load slab->stride
> - obj_exts = READ_ONCE(slab->obj_exts)
> - if (obj_exts)
> - metadata_addr = stride * index +
> obj_exts
>
> // Wait, obj_exts is non-NULL,
> // but slab->stride is stale!
>
> // Now, metadata_addr is wrong.
>
> Hmm, this could definitely happen when two CPUs try to allocate/free
> objects from/to the same slab. We need to make sure that, CPUs cannot
> see stale slab->stride as long as slab->obj_exts is not NULL.
>
> # How I tried to fix it
>
> An expensive solution would be do:
>
> CPU X: CPU Y:
> - slab->stride = 16 - READ_ONCE(slab->obj_exts)
> - smp_wmb() - if (obj_exts)
> - slab->obj_exts = <something> - smp_rmb()
> - plain load slab->stride
>
> Then, CPU Y should see either (obj_exts == 0), or
> (obj_exts != 0 and a valid stride). (obj_exts != 0) && (invalid stride)
> is impossible.
>
> This fix [5] seems to resolve the bug [6], yay!
>
> Before testing this fix, I wasn't fully convinced that it was a memory
> ordering issue. But after testing it, it seems reasonable to assume that
> it's indeed a memory ordering issue.
Apologies for delay. I had to confirm that there was a confusion
in the analysis above.
It turns out that smp_wmb()+smp_rmb() pair didn't really fix the
underlying problem [10]. And the confusion was that the bugs reported
[5] [7] are actually caused by lack of enforced memory ordering.
It's true that there was a theoretical memory ordering issue (now fixed
in 7.0-rc2 [7]), but the reason why stride value was invalid was because
stride's type was unsigned short, which was too small [9] [11].
So my previous argument that "probably there is a user that violates
slab's assumption" becomes invalid. That's a relif ;)
> [5] https://lore.kernel.org/linux-mm/aZ2Gwie5dpXotxWc@hyeyoo
> [6] https://lore.kernel.org/linux-mm/84492f08-04c2-485c-9a18-cdafd5a9c3e5@xxxxxxxxxxxxx
[9] https://lore.kernel.org/linux-mm/20260303135722.2680521-1-harry.yoo@xxxxxxxxxx
[10] https://lore.kernel.org/linux-mm/aaj--Lej6kWE0aV-@hyeyoo
[11] https://lore.kernel.org/linux-mm/41f1c856-2c41-4d11-96e6-079d95d8efbb@xxxxxxxxxxxxx
--
Cheers,
Harry / Hyeonggon