Re: [PATCH 3/3] Revert "lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly"
From: Daniel Wagner
Date: Mon Mar 02 2026 - 09:11:56 EST
Hi Ming,
Sorry for the late response. Last week the mail server did take a break...
On Thu, Feb 26, 2026 at 10:04:18PM +0800, Ming Lei wrote:
> On Thu, Feb 26, 2026 at 02:40:37PM +0100, Daniel Wagner wrote:
> > This reverts commit 0263f92fadbb9d294d5971ac57743f882c93b2b3.
> >
> > The reason the lock was removed was that the nvme-pci driver reset
> > handler attempted to acquire the CPU read lock during CPU hotplug
> > offlining (holds the CPU write lock). Consequently, the block layer
> > offline notifier callback could not progress because in-flight requests
> > were detected.
> >
> > Since then, in-flight detection has been improved, and the nvme-pci
> > driver now explicitly updates the hctx state when it is safe to ignore
> > detected in-flight requests. As a result, it's possible to reintroduce
> > the CPU read lock in group_cpus_evenly.
>
> Can you explain your motivation a bit? Especially adding back the lock
> causes the API hard to use. Any benefit?
Sure, I would like to add the lock back to group_cpus_evenly so it's
possible to add support for the isolcpu use case. For the isolcpus case,
it's necessary to access the cpu_online_mask when creating a
housekeeping cpu mask. I failed to find a good solution which doesn't
introduce horrible hacks (see Thomas' feedback on this [1]).
Anyway, I am not totally set on this solution, but I having a proper
lock in this code path would make the isolcpu extension way cleaner I
think.
What do you exactly mean with 'API hard to use'? The problem that the
caller/driver has to make sure it doesn't do anything like the nvme-pci
driver?
[1] https://lore.kernel.org/linux-nvme/87cy7vrbc4.ffs@tglx/
Thanks,
Daniel