Re: CPU Hotplug rework

From: Peter Zijlstra
Date: Mon Mar 26 2012 - 14:41:40 EST

Next message: Greg KH: "Re: [PATCH 0/6] firmware_class: Fix problems with usermodehelper test"
Previous message: Peter Zijlstra: "Re: [PATCH 11/39] autonuma: CPU follow memory algorithm"
In reply to: Steven Rostedt: "Re: CPU Hotplug rework"
Next in thread: Rusty Russell: "Re: CPU Hotplug rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 2012-03-26 at 13:05 -0400, Steven Rostedt wrote:
> On Mon, 2012-03-26 at 18:13 +0200, Peter Zijlstra wrote:
> > On Mon, 2012-03-26 at 11:22 -0400, Steven Rostedt wrote:
>
> > So how about we add another variant of kthread_freezable_should_stop(),
> > maybe call it kthread_bound_should_stop() that checks if the cpu its
> > bound to goes awol, if so, park it.
>
> Do you mean to have this function automate the "park". When it is
> called, if the cpu is going down it should simply schedule off and not
> return until the CPU comes back on line?

Yep..

> Actually, why not just keep "kthread_should_stop()" and instead create a
> "kthread_park()", and call that instead of kthread_stop(). Then when the
> task calls kthread_should_stop(), that can park the thread then.

That would add an if ((current->flags & PF_THREAD_BOUND) &&
kthread_should_park(cpu))) conditional to every kthread_stop()
invocation. So as per the example of kthread_freezable_should_stop() I
opted for another function.

Note that kernel/workqueue.c should be fixed to use kthread_stop() or
whatever variant we implement, as it currently uses a home brewn
solution to stop threads.

> > Then after CPU_DOWN_PREPARE, wait for all such threads (as registered
> > per kthread_bind()) to pass through kthread_bound_should_stop() and get
> > frozen.
>
> We could have the notifiers call kthread_park().

You mean to avoid having to track them through kthread_bind() ?

The advantage of tracking them is that its harder to 'forget' about one.

> > This should restore PF_THREAD_BOUND to mean its actually bound to this
> > cpu, since if the cpu goes down, the task won't actually run at all.
> > Which means you can again use PF_THREAD_BOUND to by-pass the whole
> > get_online_cpus()/pin_curr_cpu() muck.
> >
> > Any subsystem that can still accrue state after this (eg, softirq/rcu
> > and possible kworker) need to register a CPU_DYING or CPU_DEAD notifier
> > to either complete the state or take it away and give it to someone
> > else.
>
> I'm afraid that this part sounds easier than done.

Got anything particularly difficult in mind?

Workqueues can give the gcwq to unbound threads -- it doesn't guarantee
the per-cpu-ness of work items anyway.

Softirqs can be ran from CPU_DYING since interrupts will never be
enabled again at that point.

RCU would have to make sure the cpu doesn't complete a grace period and
fixup from CPU_DEAD, so have it complete any outstanding grace periods,
move it to extended idle and steal the callback list.

I'm not sure there's anything really hard there.

> > > Now what are the issues we have:
> > >
> > > 1) We need to get tasks off the CPU going down. For most tasks this is
> > > not an issue. But for CPU specific kernel threads, this can be an issue.
> > > To get tasks off of the CPU is required before the notifiers are called.
> > > This is to keep them from creating work on the CPU, because after the
> > > notifiers, there should be no more work added to the CPU.
> >
> > This is hard for things like ksoftirq, because for as long as interrupts
> > are enabled we can trigger softirqs. And since we need to deal with
> > that, we might as well deal with it for all and not bother.
>
> Heh, at least for -rt we don't need to worry about that. As interrupts
> are threads and are moved to other CPUS. Although I'm not sure that's
> true about the timer softirq.

Its a problem for rt, since as long as interrupts are enabled (and we
can schedule) interrupts can come in and wake their respective threads,
this can happen during the CPU_DOWN_PREPARE notifier just fine.

For both -rt and mainline we can schedule right up until we call
stop-machine, mainline (!threadirq) will continue servicing interrupts
another few instructions until the stop_machine bits disable interrupts
on all cpus. The difference is really not that big.

> > > 3) Some tasks do not go offline, instead they just get moved to another
> > > CPU. This is the case of ksoftirqd. As it is killed after the CPU is
> > > down (POST_DEAD) (at least in -rt it is).
> >
> > No, we should really stop allowing tasks that were kthread_bind() to run
> > anywhere else. Breaking the strict affinity and letting them run
> > someplace else to complete their work is what gets is in a whole heap of
> > trouble.
>
> Agreed, but to fix this is not a easy problem.

I'm not sure its that hard, just work.

If we get the above stuff done, we should be able to put BUG_ON(p->flags
& PF_THREAD_BOUND) in select_fallback_rq().

Also, I think you should opt for the solution that has the
cleanest/strongest semantics so you can add more debug infrastructure
around it to enforce it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Greg KH: "Re: [PATCH 0/6] firmware_class: Fix problems with usermodehelper test"
Previous message: Peter Zijlstra: "Re: [PATCH 11/39] autonuma: CPU follow memory algorithm"
In reply to: Steven Rostedt: "Re: CPU Hotplug rework"
Next in thread: Rusty Russell: "Re: CPU Hotplug rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]