Re: [QUESTION/REGRESSION] Unbound kthreads scheduled on nohz_full CPUs after commit 041ee6f3727a

From: Frederic Weisbecker

Date: Sun Mar 22 2026 - 18:33:44 EST


Le Mon, Mar 23, 2026 at 01:48:07AM +0800, sheviks a écrit :
> Hi Frederic, Waiman and maintainers,
>
> A quick follow-up on my previous report. After leaving the system idle
> for a longer period, I made an observation that pinpoints the issue
> more precisely.
>
> The cgroup v2 dynamic isolation does eventually work. I noticed that
> the unbound kthreads are eventually migrated to the housekeeping CPU
> (CPU 0), but only after they wake up from sleep and enter the running
> state. This lazy migration highlights why commit 041ee6f3727a is
> causing issues for setups using nohz_full= without isolcpus=:
>
> 1. At boot time, because isolcpus= is absent, the HK_TYPE_DOMAIN mask
> includes the nohz_full CPUs.
>
> 2. When unbound kthreads are initially spawned or have their affinity
> set, the new logic relies solely on HK_TYPE_DOMAIN. Consequently, they
> are placed on the nohz_full CPUs and immediately go to sleep there.
>
> 3. They remain "trapped" on the isolated CPUs until a wake-up event
> finally forces the scheduler to migrate them according to the updated
> cgroup affinity.
>
> This brings the focus back to HK_TYPE_KTHREAD vs HK_TYPE_DOMAIN. While
> HK_TYPE_DOMAIN might default to all CPUs without isolcpus=,
> HK_TYPE_KTHREAD correctly excludes the nohz_full CPUs from the very
> beginning.
>
> Is this "spawn on nohz_full and wait for wake-up to migrate" behavior
> intended?

Yes. The affinity is applied right after the first wake-up of the kthread.
And this wake-up is supposed to happen right after the kthread creation
because the only purpose of this first wake-up is to allow for calling
kthread_park(), kthread_bind() or kthread_affine_preferred() between
kthread_create() and wake_up_process().

So I'm wondering why you're facing such an issue. Because by the time
you create cgroup isolated partitions, all kthreads should have performed
their first wake-up already.

Which kthread did you observe after cgroup setting that didn't perform its
first wake-up?

> To prevent these sleeping kthreads from polluting isolated
> CPUs before cgroups can intervene, should the initial affinity check
> still consider HK_TYPE_KTHREAD alongside or instead of HK_TYPE_DOMAIN?

Well, domain isolation wants unbound kthreads to move away. And nohz_full
doesn't make sense without domain isolation. So we should focus on making
things work for HK_TYPE_DOMAIN.

Thanks.

--
Frederic Weisbecker
SUSE Labs