Re: [PATCH 2/3] sched: terminate newidle balancing once atleast one task has moved over

From: Gregory Haskins
Date: Wed Jul 09 2008 - 08:01:17 EST


>>> On Wed, Jul 9, 2008 at 7:17 AM, in message
<200807092117.01669.nickpiggin@xxxxxxxxxxxx>, Nick Piggin
<nickpiggin@xxxxxxxxxxxx> wrote:
> On Wednesday 09 July 2008 20:53, Gregory Haskins wrote:
>> >>> On Wed, Jul 9, 2008 at 4:09 AM, in message
>>
>> <200807091809.52293.nickpiggin@xxxxxxxxxxxx>, Nick Piggin
>>
>> <nickpiggin@xxxxxxxxxxxx> wrote:
>> > On Tuesday 08 July 2008 22:37, Gregory Haskins wrote:
>> >> >>> On Tue, Jul 8, 2008 at 1:00 AM, in message
>> >>
>> >> <200807081500.18245.nickpiggin@xxxxxxxxxxxx>, Nick Piggin
>> >>
>> >> <nickpiggin@xxxxxxxxxxxx> wrote:
>> >> > On Saturday 28 June 2008 06:29, Gregory Haskins wrote:
>> >> >> Inspired by Peter Zijlstra.
>> >> >>
>> >> >> Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>
>> >> >
>> >> > What happened to the feedback I sent about this?
>> >> >
>> >> > It is still nack from me.
>> >>
>> >> Ah yes. Slipped through the cracks...sorry about that.
>> >>
>> >> What if we did "if (idle == CPU_NEWLY_IDLE && need_resched())" instead?
>> >
>> > Isn't that exactly the same thing
>>
>> Not quite. The former version would break on *any* succesful enqueue (as a
>> result of a local move_task as well as a remote wake-up/migration). The
>> latter version will only break on the the remote variety. You were
>> concerned about stopping a move_task operation early because it would
>> reduce efficiency, and I do not entirely disagree. However, this really
>> only concerns the local type (which have now been removed).
>>
>> Remote preemptions should (IMO) always break immediately because it would
>> have been likely to invalidate the f_b_g() calculation anyway, and
>> low-latency requirements dictate its the right thing to do.
>
> I thought this was about newidle balancing? Tasks are always going to
> be coming from remote runqueues, aren't they?

Yes, but you misunderstand me. I am referring to "push" (remote moves to us) verses
"pull" (we move from remote). During a move_task() we sometimes have to drop the
RQ lock in the double_lock balance. This gives a remote CPU a chance to grab the lock
and potentially move tasks to us as part of either a migration operation, or a wake-up.

When this happens, several things should be noted: 1) it will change the load "landscape"
such that any previous computation in f_b_g() is potentially invalid. 2) The task that was
moved may be higher priority and therefore should not have to wait for move_tasks() to
finish moving some arbitrary number of lower-priority tasks (and note that "lower prio"
is a high-probability since NEWIDLE only does CFS tasks, and only RT tasks typically
migrate like this).

Therefore, IMO it doesnt make sense to continue moving more load. Just stop and let the
scheduler sort it out. At the very least it needs to recompute how much load to move.

>
>
>> > because any task will preempt the idle thread?
>>
>> During NEWIDLE this is a preempt-disabled section because we are still in
>> the middle of a schedule(). Therefore there will be no involuntary
>> preemption and that is why we are concerned with making sure we check for
>> voluntary preemption. The move_tasks() will run to completion without this
>> patch. With this patch it will break the operation if someone tries to
>> preempt us.
>
> Firstly, won't the act of pulling tasks set the need_resched condition?

Hmm.. Indeed. You are probably right about that and I need some other way to indicate
that a task was pushed to us over anything we might have pulled.

>
> Secondly, even if it does what you say, what exactly would be the difference
> between blocking a newly moved task from running and blocking a newly woken
> task from running? Either way you introduce the same worst case latency
> condition.

Tasks that are pushed to us have a good chance to be RT (since RT is a heavy user of
"push" methods, while CFS is mostly pull). Conversely, tasks that are pulled to us by
newidle are guaranteed *not* to be RT (since newidle balancing will only pull CFS tasks).

Perhaps that is the answer: terminate on s/need_resched()/rq->rt.nr_running. Its not
exactly scalable to an arbitrary arrangement of future sched_classes, but that could be
addresses when those sched_classes become available.

To be fair, I think this is what Peter was trying to do with his more elaborate version of
patches that I based this one on.

>
>
>> I'll keep an open mind but I am pretty sure this is something we should be
>> doing. As far as I can tell, there should be no downside with this second
>> version.
>
> I don't think it has really been thought through that well. So I'm still
> against it.
>
>> As a compromise we could put an #ifdef CONFIG_PREEMPT around this
>> new logic, but I don't think it is strictly necessary.
>
> That's not very nice. It's reasonable to run with CONFIG_PREEMPT but not
> blindly want to trade latency for throughput.

How do you come to this conclusion? Continuing to perform a move under these
circumstances (or at least, my intended circumstances) is against stale data and
could just as well hurt throughput as much as help it. Since the move is
essentially arbitrary once this threshold is crossed, even the throughput will
become non-deterministic ;)

Regards,
-Greg


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/