Re: [PATCH 00/22] add support for Clang LTO

From: Peter Zijlstra
Date: Thu Jun 25 2020 - 04:58:21 EST


On Thu, Jun 25, 2020 at 10:24:33AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2020 at 10:03:13AM +0200, Peter Zijlstra wrote:

> > I'm sure Will will respond, but the basic issue is the trainwreck C11
> > made of dependent loads.
> >
> > Anyway, here's a link to the last time this came up:
> >
> > https://lore.kernel.org/linux-arm-kernel/20171116174830.GX3624@xxxxxxxxxxxxxxxxxx/
>
> Another good read:
>
> https://lore.kernel.org/lkml/20150520005510.GA23559@xxxxxxxxxxxxxxxxxx/
>
> and having (partially) re-read that, I now worry intensily about things
> like latch_tree_find(), cyc2ns_read_begin, __ktime_get_fast_ns().
>
> It looks like kernel/time/sched_clock.c uses raw_read_seqcount() which
> deviates from the above patterns by, for some reason, using a primitive
> that includes an extra smp_rmb().
>
> And this is just the few things I could remember off the top of my head,
> who knows what else is out there.

As an example, let us consider __ktime_get_fast_ns(), the critical bit
is:

seq = raw_read_seqcount_latch(&tkf->seq);
tkr = tkf->base + (seq & 0x01);
now = tkr->base;

And we hard rely on that being a dependent load, so:

LOAD seq, (tkf->seq)
LOAD tkr, tkf->base
AND seq, 1
MUL seq, sizeof(tk_read_base)
ADD tkr, seq
LOAD now, (tkr->base)

Such that we obtain 'now' as a direct dependency on 'seq'. This ensures
the loads are ordered.

A compiler can wreck this by translating it into something like:

LOAD seq, (tkf->seq)
LOAD tkr, tkf->base
AND seq, 1
CMP seq, 0
JE 1f
ADD tkr, sizeof(tk_read_base)
1:
LOAD now, (tkr->base)

Because now the machine can speculate and load now before seq, breaking
the ordering.