Re: [BUG] Memory ordering between kmalloc() and kfree()? it's confusing!

From: Harry Yoo

Date: Thu Feb 26 2026 - 12:41:41 EST

On Thu, Feb 26, 2026 at 10:45:55AM -0500, Alan Stern wrote:
> On Thu, Feb 26, 2026 at 03:35:08PM +0900, Harry Yoo wrote:
> > Hello, SLAB, LKMM, and KCSAN folks!
> >
> > I'd like to discuss slab's assumption on users regarding memory ordering.
> >
> > Recently, I've been investigating an interesting slab memory ordering
> > issue [3] [4] in v7.0-rc1, which made me think about memory ordering
> > for slab objects.
> >
> > But without answering "What does slab expect users to do for correct
> > operation?", I kept getting puzzled, and my brain hurt too much :/
> > I'm writing things down to stop getting confused :)
> >
> > Since I have never thought about this before, my reasoning could be
> > partially or entirely incorrect. If so, please kindly let me know.
> >
> > # Slab's assumption: Stores to object, its metadata, or struct slab
> > # must be visible to the CPU that frees the object, when it is
> > # passed to kfree(). It's users' responsibility to guarantee that.
> >
> > When the slab allocator allocates an object, it updates its metadata and
> > struct slab fields. After allocation, the user of slab updates object's
> > content. As long as the object is freed on the same CPU that it was
> > allocated, kfree() can see those stores (A CPU must be able to see
> > what's in its store buffer), so no problem!
> >
> > However, when e.g.) the pointer to object is stored in a shared variable
> > and then freed on a different CPU, things become trickier.
> >
> > In this case, I think it's fair for the slab allocator to assume that:
> >
> > 1) Such stores must involve _at least_ a release barrier
> > (for example, via {cmp,}xchg{,_release}, or smp_store_release())
> > to ensure preceding stores are visible to other CPUs before
> > the pointer store becomes visible, and
> >
> > 2) The CPU that frees an object must invoke at least an acquire
> > barrier to ensure that stores to object content / metadata, etc.,
> > are visible to the freeing CPU when it calls kfree().
> >
> > Because the slab allocator itself doesn't guarantee that such
> > barriers are invoked within the allocator, it relies on users to
> > do this when needed.
>
> It doesn't? Then how does the slab allocator guarantee that two
> different CPUs won't try to perform allocations or deallocations from
> the same slab at the same time, messing everything up?

Ah, alloc/free slowpaths do use cmpxchg128 or spinlock and
don't mess things up.

But fastpath allocs/frees are served from percpu array that is protected
by a local_lock. local_lock has a compiler barrier in it, but that's
not enough.

> Can you explain how this is meant to work, for those of us who don't
> know anything about the slab allocator's internal mechanism?

--
Cheers,
Harry / Hyeonggon