Re: [PATCH RFC 0/4] memcg,slab: kmalloc_nolock() fixes

From: Harry Yoo

Date: Wed Jun 24 2026 - 16:20:00 EST




On 6/25/26 1:30 AM, Alexei Starovoitov wrote:
> On Wed Jun 24, 2026 at 6:11 AM PDT, Harry Yoo (Oracle) wrote:
>>
>> Bug 1 was reported by lockdep, and bugs 2 [2] and 3 [3] were
>> reported by Sashiko.
>
> ... and in fixes for sashiko complains sashiko finds more issues.
> I don't think it will ever end. I suggest to fix realistic scenarios
> instead of one out of billion cases that sashiko think is plausible
> but will never be hit in reality.

But we can trigger debug warnings for the first two bugs fairly
easily with slub_kunit. Doesn't that count as realistic scenarios?

(Ok, I admit that the last bug was purely theoretical, and would not
have bothered if the fix was not straightforward)

You might argue that it's not as urgent as we might assume
(e.g., it's okay to not fix them asap or backport), but I don't think
we can just ignore them.

It might be bit harder to cause an actual deadlock than to
trigger a debug warning, though. We can discuss that [1] [2].

> The chance of server crashing
> due to cosmic rays are higher than such bugs.

I'm not convinced that it's the case.

Well, I don't know what are the chances of calling kmalloc_nolock()
in NMI, or within slab or memcg (via tracing), and that is an important
factor here.

>> To BPF folks: do we need to backport kmalloc_nolock() support
>> for architectures without __CMPXCHG_DOUBLE to v6.18?
>
> nope.

Thanks, that was what I was hoping :)

# The discussion

[1] Bug 1: freeing a slab object via kfree_nolock() or draining
the stock in kmalloc_nolock() happens very frequently. The objcg should
have been reparented (which happens upon cgroup removal, which is not
too rare) at some point if the objcg stock or a slab object is holding
the last reference.

Can this cause an actual deadlock? That depends on the chances of
calling kmalloc/kfree_nolock() in the middle of reparenting (see
reparent_[un]locks()) or objcg list manipulation under objcg_lock.

[2] Bug 2: You should exceed memcg limit to invoke
memcg_alloc_abort_single(), but you don't even have to be under
memory pressure to exceed that. (yeah, I had to modify the
kernel to implement a fault-injection-like-feature to trigger this).
Unfortunately, you cannot reclaim memory in unknown context when you
hit the limit. This should be fairly easy to trigger.

Can this cause an actual deadlock? That depends on the chances
of calling kmalloc/kfree_nolock() within the slab allocator.

--
Cheers,
Harry / Hyeonggon

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature