Re: [patch 00/16] sched: per-entity load-tracking

From: Paul Turner
Date: Fri Oct 05 2012 - 05:07:38 EST


On Mon, Sep 24, 2012 at 10:16 AM, Benjamin Segall <bsegall@xxxxxxxxxx> wrote:
> "Jan H. Schönherr" <schnhrr@xxxxxxxxxxxxxxx> writes:
>
>> Hi Paul.
>>
>> Am 23.08.2012 16:14, schrieb pjt@xxxxxxxxxx:
>>> Please find attached the latest version for CFS load-tracking.
>>
>> Originally, I thought, this series also takes care of
>> the leaf-cfs-runqueue ordering issue described here:
>>
>> http://lkml.org/lkml/2011/7/18/86
>>
>> Now, that I had a closer look, I see that it does not take
>> care of it.
>>
>> Is there still any reason why the leaf_cfs_rq-list must be sorted?
>> Or could we just get rid of the ordering requirement, now?
>
> Ideally yes, since a parent's __update_cfs_rq_tg_load_contrib and
> update_cfs_shares still depend on accurate values in
> runnable_load_avg/blocked_load_avg from its children. That said, nothing
> should completely fall over, it would make load decay take longer to
> propogate to the root.
>>
>> (That seems easier than to fix the issue, as I suspect that
>> __update_blocked_averages_cpu() might still punch some holes
>> in the hierarchy in some edge cases.)
>
> Yeah, I suspect it's possible that the parent ends up with a slightly
> lower runnable_avg_sum if they're both hovering around the max value
> since it isn't quite continuous, and it might be the case that this
> difference is large enough to require one more tick to decay to zero.

OK so coming back to this. I had a look at this last week and
realized I'd managed to pervert my original intent.

Specifically, the idea here was barring numerical rounding errors
about LOAD_AVG_MAX we can guarantee a parent's runnable average is
greater than or equal to its child, since a parent is runnable
whenever its child is runnable by definition. Provided we fix up
possible rounding errors (e.g. with a clamp) this then guarantees
we'll always remove child nodes before parent.

So I did this. Then I thought: oh dear. When I'd previously proposed
the above as a resolution for out-of-order removal I had not tackled
the problem of correct accounting on bandwidth constrained entities.
It turns out we end up having to "stop" time to handle this
efficiently / correctly. But this means that we can then no longer
depend on the constraint above as the sums on a sub-tree can
potentially become out of sync.

So I got back to this again tonight and just spent a few hours tonight
looking at some alternate approaches to resolve this. There's a few
games we can play here but after all of that I now re-realize we still
won't handle an on-list grand-parent correctly when the parent/child
are not on tree; and that this is fundamentally an issue with
enqueue's ordering -- no hole punching from parent before child
removal required.

I suspect we might want to do a segment splice on enqueue after all.
Let me sleep on it.

- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/