Re: CPU Hotplug rework

From: Peter Zijlstra
Date: Mon Mar 26 2012 - 12:47:47 EST

On Mon, 2012-03-26 at 11:22 -0400, Steven Rostedt wrote:

> The workaround I added was to do several things:
> 1) instead of blocking on the hotplug lock, try to migrate itself
> instead. If it succeeds, then we don't need to worry about this thread.
> But if the thread is pinned to the CPU, then we need to worry about it.
> I first had it block only in this case, but that wasn't good enough, so
> I let them just continue.
> 2) had the CPU thread that was created do a multi-phase. The first phase
> it still waited for the cpu ref counter to go to zero, but instead of
> having tasks block, tasks would instead try to migrate and if I could
> not then just continue.
> 3) after all the notifiers are finished, notify the created CPU thread
> to sync tasks. Now that the notifiers are done, we can make any
> remaining task block. That is, the old method is done, where the CPU
> thread waits for the ref counter to go to zero, and new tasks will block
> on the hotplug lock. Because this happens after the notifiers, we do not
> need to worry about the previous deadlocks.

So how about we add another variant of kthread_freezable_should_stop(),
maybe call it kthread_bound_should_stop() that checks if the cpu its
bound to goes awol, if so, park it.

Then after CPU_DOWN_PREPARE, wait for all such threads (as registered
per kthread_bind()) to pass through kthread_bound_should_stop() and get

This should restore PF_THREAD_BOUND to mean its actually bound to this
cpu, since if the cpu goes down, the task won't actually run at all.
Which means you can again use PF_THREAD_BOUND to by-pass the whole
get_online_cpus()/pin_curr_cpu() muck.

Any subsystem that can still accrue state after this (eg, softirq/rcu
and possible kworker) need to register a CPU_DYING or CPU_DEAD notifier
to either complete the state or take it away and give it to someone

> Now what are the issues we have:
> 1) We need to get tasks off the CPU going down. For most tasks this is
> not an issue. But for CPU specific kernel threads, this can be an issue.
> To get tasks off of the CPU is required before the notifiers are called.
> This is to keep them from creating work on the CPU, because after the
> notifiers, there should be no more work added to the CPU.

This is hard for things like ksoftirq, because for as long as interrupts
are enabled we can trigger softirqs. And since we need to deal with
that, we might as well deal with it for all and not bother.

See the CPU_DYING/DEAD notifier as described above that can deal with

> 2) Some tasks are going to go down and exit. We can audit all the
> notifier callbacks for CPU offlining, and see if we can just make them
> dormant instead of killing them. As Rusty said, it may not be that
> important to save the memory of these tasks.

Right, this shouldn't be a difficult task, but isn't required for -rt
afaict, its just good practise.

> 3) Some tasks do not go offline, instead they just get moved to another
> CPU. This is the case of ksoftirqd. As it is killed after the CPU is
> down (POST_DEAD) (at least in -rt it is).

No, we should really stop allowing tasks that were kthread_bind() to run
anywhere else. Breaking the strict affinity and letting them run
someplace else to complete their work is what gets is in a whole heap of

> All that is needed now is, at the beginning of taking the CPU down is to
> push off tasks from the CPU that may migrate. Then call the notifiers,
> and then block the rest and take the CPU down. This seems to work fine.
> It was just the implementation I proposed was a bit too ugly for
> Thomas's taste.

I really don't see the point in that.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at