Re: [PATCH 1/2] mm/percpu: Preserve NOFS/NOIO scope during chunk create and populate

From: Vlastimil Babka (SUSE)

Date: Tue Jun 02 2026 - 03:22:10 EST

On 6/2/26 05:03, Kaitao Cheng wrote:
>
>
> 在 2026/6/1 23:45, Michal Hocko 写道:
>> On Mon 01-06-26 10:27:53, Kaitao Cheng wrote:
>>> However, if we revert 9a5b183941b, it seems that all of these issues would
>>> be resolved. The only downside is that the failure rate of pcpu_alloc_noprof()
>>> allocations may increase, which might be acceptable.
>>
>> That has practical impact on some versions of iscsid which do not have
>> PR_SET_IO_FLUSHER. And maybe some more so I would rather not revert
>> based on a theoretical concerns which I believe is the case here.
>>
>
> Based on the previous discussion, I think we have a way to address most
> of the concurrency issues around percpu allocation.
>
> However, there still seems to be one remaining case that I do not yet
> have a good way to solve. For example:
>
> Thread A calls pcpu_alloc_noprof() with GFP_KERNEL and takes
> pcpu_alloc_mutex. Since the internal allocation is not constrained by
> NOFS, it may enter FS reclaim while still holding pcpu_alloc_mutex,
> creating a dependency like:
>
> pcpu_alloc_mutex -> fs_reclaim -> FS lock
> At the same time, Thread B may already hold an FS lock and then call
> pcpu_alloc_noprof() with GFP_NOFS. It will try to acquire
> pcpu_alloc_mutex and block, creating the reverse dependency:
>
> FS lock -> pcpu_alloc_mutex
> This can still form a potential deadlock cycle.
>
> Does anyone have a good suggestion for how to handle this remaining case?
> Or should we simply treat all GFP_KERNEL/GFP_NOFS allocation behavior in
> pcpu_alloc_noprof() as GFP_NOIO?
>
> If there is no clear solution for now, would it be acceptable to first
> fix some of the issues introduced by commit 9a5b183941b, and leave this
> remaining case as a pre-existing historical issue to be handled separately
> later?

We don't need to solve any issues that are only theoretical and based on
scenarios that nobody sane should be doing, i.e. Pedro already pointed out
"As in no reclaim path should be insane^W daring enough to do pcpu allocations?"

If anyone would (start to) do that, we would likely have lockdep reports
from the testing bots, which warn that the scenario can now exist, even
before it results in an actual deadlock.

Elsewhere Pedro said "The proper way of fixing this would probably be to
release pcpu_alloc_mutex (or not have it in the first place!) while you're
allocating memory."

Such a refactoring might be worth it (if it's feasible to do cleanly and
doesn't come with downsides) just to eliminate these lock dependencies
properly for good. Patching over individual theoretical issues is IMHO not
worth it.