Re: [QUESTION/REGRESSION] Unbound kthreads scheduled on nohz_full CPUs after commit 041ee6f3727a

From: Frederic Weisbecker

Date: Mon Mar 23 2026 - 17:32:04 EST

Le Mon, Mar 23, 2026 at 10:00:32AM +0800, sheviks a écrit :
> Frederic Weisbecker <frederic@xxxxxxxxxx> 於 2026年3月23日週一上午6:33寫道：
> > Yes. The affinity is applied right after the first wake-up of the kthread.
> > And this wake-up is supposed to happen right after the kthread creation
> > because the only purpose of this first wake-up is to allow for calling
> > kthread_park(), kthread_bind() or kthread_affine_preferred() between
> > kthread_create() and wake_up_process().
> >
> > So I'm wondering why you're facing such an issue. Because by the time
> > you create cgroup isolated partitions, all kthreads should have performed
> > their first wake-up already.
> >
> > Which kthread did you observe after cgroup setting that didn't perform its
> > first wake-up?
> >
>
> Thank you for the response. I have performed a more detailed
> observation to track the migration behavior of these kthreads over
> time.
>
> Setup and Procedure:
> Booted with: nohz_full=1-7 rcu_nocbs=1-7 irqaffinity=0 (No isolcpus).
> Monitored kthreads on CPUs 1-7 using: ps -eLo cpuid,comm | grep -e
> COMM -e "^ *[1-7] " | grep -ve "/[1-7]$" -e "kworker/[1-7]:" -e nvme0q
>
> Initially, there were 30 kthreads residing on CPUs 1-7 right after boot.
> Manually created /sys/fs/cgroup/isolated1.slice and configured
> cpuset.cpus.exclusive and cpuset.cpus.partition=isolated.
>
> Migration Timeline:
> Within 1 minute: rcuog/0 and rcuog/4 migrated back to CPU 0.
> At 2 minutes: khungtaskd and jbd2/zram0-8 migrated.
> At 8 minutes: kthreadd migrated.
> At 17 minutes: pr/legacy migrated. The count dropped to 24 kthreads.
> After 9 hours: 24 kthreads remain on CPUs 1-7.
>
> The 24 kthreads remaining on CPUs 1-7 after 9 hours:
> CPUID COMMAND
> 1 card0-crtc3
> 1 ksmd
> 1 scsi_eh_4
> 1 scsi_eh_9
> 3 pool_workqueue_release
> 4 card0-crtc0
> 4 kdevtmpfs
> 4 rcu_exp_gp_kthread_worker
> 4 scsi_eh_5
> 5 oom_reaper
> 5 rcub/0
> 5 scsi_eh_0
> 5 scsi_eh_1
> 5 scsi_eh_2
> 5 scsi_eh_8
> 6 kswapd0
> 6 psimon
> 6 scsi_eh_6
> 6 watchdogd
> 7 card0-crtc1
> 7 card0-crtc2
> 7 psimon
> 7 scsi_eh_3
> 7 scsi_eh_7

"ps -o cpuid" tells where the task is currently running or, if sleeping, where
it ran last.

The cpuids that belong to isolated CPUs you're observing on some kthreads are there
because those tasks have slept the whole time since the cpuset isolated
partition was created. Yet they have been correctly migrated to CPU 0 and that
will be displayed on "ps -o cpuid" the next time those kthreads are woken up.

Use taskset for more accurate information as to where a task is allowed to run.

Thanks.

--
Frederic Weisbecker
SUSE Labs