Re: weird loadavg on idle machine post 5.7

From: Valentin Schneider
Date: Tue Jul 07 2020 - 06:21:25 EST



On 07/07/20 09:17, Peter Zijlstra wrote:
> On Tue, Jul 07, 2020 at 12:56:04AM +0100, Valentin Schneider wrote:
>
>> > @@ -2605,8 +2596,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>> > *
>> > * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
>> > * __schedule(). See the comment for smp_mb__after_spinlock().
>> > + *
>> > + * Form a control-dep-acquire with p->on_rq == 0 above, to ensure
>> > + * schedule()'s deactivate_task() has 'happened' and p will no longer
>> > + * care about it's own p->state. See the comment in __schedule().
>> > */
>> > - smp_rmb();
>> > + smp_acquire__after_ctrl_dep();
>>
>> Apologies for asking again, but I'm foolishly hopeful I'll someday be able
>> to grok those things without half a dozen tabs open with documentation and
>> Paul McKenney papers.
>>
>> Do I get it right that the 'acquire' part hints this is equivalent to
>> issuing a load-acquire on whatever was needed to figure out whether or not
>> the take the branch (in this case, p->on_rq, amongst other things); IOW
>> ensures any memory access appearing later in program order has to happen
>> after the load?
>>
>> That at least explains to me the load->{load,store} wording in
>> smp_acquire__after_ctrl_dep().
>
> Yes.
>
> So the thing is that hardware MUST NOT speculate stores, or rather, if
> it does, it must take extreme measures to ensure they do not become
> visible in any way shape or form, since speculative stores lead to
> instant OOTA problems.
>
> Therefore we can say that branches order stores and if the branch
> condition depends on a load, we get a load->store order. IOW the load
> must complete before we can resolve the branch, which in turn enables
> the store to become visible/happen.
>

Right, I think that point is made clear in memory-barriers.txt.

> If we then add an smp_rmb() to the branch to order load->load, we end up
> with a load->{load,store} ordering, which is equivalent to a
> load-acquire.
>
> The reason to do it like that, is that load-aquire would otherwise
> require an smp_mb(), since for many platforms that's the only barrier
> that has load->store ordering.
>
> The down-side of doing it like this, as Paul will be quick to point out,
> is that the C standard doesn't recognise control dependencies and thus
> the compiler would be in its right to 'optimize' our conditional away.
>

Yikes!

> We're relying on the compilers not having done this in the past and
> there being sufficient compiler people interested in compiling Linux to
> avoid this from happening.
>
>
> Anyway, this patch is basically:
>
> LOAD p->state LOAD-ACQUIRE p->on_rq == 0
> MB
> STORE p->on_rq, 0 STORE p->state, TASK_WAKING
>
> which ensures the TASK_WAKING store happens after the p->state load.
> Just a wee bit complicated due to not actually adding any barriers while
> adding additional ordering.
>

Your newer changelog also helped in that regards.

Thanks a ton for the writeup!

> Anyway, let me now endeavour to write a coherent Changelog for this mess
> :-(