Re: [RFC][PATCH 0/5] arch: atomic rework

From: Michael Matz
Date: Mon Feb 24 2014 - 08:55:21 EST


Hi,

On Fri, 21 Feb 2014, Paul E. McKenney wrote:

> > And with conservative I mean "everything is a source of a dependency, and
> > hence can't be removed, reordered or otherwise fiddled with", and that
> > includes code sequences where no atomic objects are anywhere in sight [1].
> > In the light of that the only realistic way (meaning to not have to
> > disable optimization everywhere) to implement consume as currently
> > specified is to map it to acquire. At which point it becomes pointless.
>
> No, only memory_order_consume loads and [[carries_dependency]]
> function arguments are sources of dependency chains.

I don't see [[carries_dependency]] in the C11 final draft (yeah, should
get a real copy, I know, but let's assume it's the same language as the
standard). Therefore, yes, only consume loads are sources of
dependencies. The problem with the definition of the "carries a
dependency" relation is not the sources, but rather where it stops.
It's transitively closed over "value of evaluation A is used as operand in
evaluation B", with very few exceptions as per 5.1.2.4#14. Evaluations
can contain function calls, so if there's _any_ chance that an operand of
an evaluation might even indirectly use something resulting from a consume
load then that evaluation must be compiled in a way to not break
dependency chains.

I don't see a way to generally assume that e.g. the value of a function
argument can impossibly result from a consume load, therefore the compiler
must assume that all function arguments _can_ result from such loads, and
so must disable all depchain breaking optimization (which are many).

> > [1] Simple example of what type of transformations would be disallowed:
> >
> > int getzero (int i) { return i - i; }
>
> This needs to be as follows:
>
> [[carries_dependency]] int getzero(int i [[carries_dependency]])
> {
> return i - i;
> }
>
> Otherwise dependencies won't get carried through it.

So, with the above do you agree that in absense of any other magic (see
below) the compiler is not allowed to transform my initial getzero()
(without the carries_dependency markers) implementation into "return 0;"
because of the C11 rules for "carries-a-dependency"?

If so, do you then also agree that the specification of "carries a
dependency" is somewhat, err, shall we say, overbroad?

> > depchains don't matter, could _then_ optmize it to zero. But that's
> > insane, especially considering that it's hard to detect if a given context
> > doesn't care for depchains, after all the depchain relation is constructed
> > exactly so that it bleeds into nearly everywhere. So we would most of
> > the time have to assume that the ultimate context will be depchain-aware
> > and therefore disable many transformations.
>
> Any function that does not contain a memory_order_consume load and that
> doesn't have any arguments marked [[carries_dependency]] can be
> optimized just as before.

And as such marker doesn't exist we must conservatively assume that it's
on _all_ parameters, so I'll stand by my claim.

> > Then inlining getzero would merely add another "# j.dep = i.dep"
> > relation, so depchains are still there but the value optimization can
> > happen before inlining. Having to do something like that I'd find
> > disgusting, and rather rewrite consume into acquire :) Or make the
> > depchain relation somehow realistically implementable.
>
> I was actually OK with arithmetic cancellation breaking the dependency
> chains. Others on the committee felt otherwise, and I figured that (1)
> I wouldn't be writing that kind of function anyway and (2) they knew
> more about writing compilers than I. I would still be OK saying that
> things like "i-i", "i*0", "i%1", "i&0", "i|~0" and so on just break the
> dependency chain.

Exactly. I can see the problem that people had with that, though. There
are very many ways to write conceiled zeros (or generally neutral elements
of the function in question). My getzero() function is one (it could e.g.
be an assembler implementation). The allowance to break dependency chains
would have to apply to such cancellation as well, and so can't simply
itemize all cases in which cancellation is allowed. Rather it would have
had to argue about something like "value dependency", ala "evaluation B
depends on A, if there exist at least two different values A1 and A2
(results from A), for which evaluation B (with otherwise same operands)
yields different values B1 and B2".

Alas, it doesn't, except if you want to understand the term "the value of
A is used as an operand of B" in that way. Even then you'd still have the
second case of the depchain definition, via intermediate not even atomic
memory stores and loads to make two evaluations be ordered per
carries-a-dependency.

And even that understanding of "is used" wouldn't be enough, because there
are cases where the cancellation happens in steps, and where it interacts
with the third clause (transitiveness): Assume this:

a = something() // evaluation A
b = 1 - a // evaluation B
c = a - 1 + b // evaluation C

Now, clearly B depends on A. Also C depends on B (because with otherwise
same operands changing just B also changes C), because of transitiveness C
then also depends on A. But equally cleary C was just an elaborate way to
write "0", and so depends on nothing. The problem was of course that A
and B weren't independent when determining the dependencies of C. But
allowing cancellation to break dependency chains would have to allow for
these cases as well.

So, now, that leaves us basically with depchains forcing us to disable
many useful transformation or finding some other magic. One would be to
just regard all consume loads as acquire loads and be done (and
effectively remove the ill-advised "carries a dependency" relation from
consideration).

You say downthread that it'd also be possible to just emit barriers before
all function calls (I say "all" because the compiler will generally
have applied some transformation that broke depchains if they existed).
That seems to me to be a bigger hammer than just ignoring depchains and
emit acquires instead of consumes (because the latter changes only exactly
where atomics are used, the former seems to me to have unbounded effect).

So, am still missing something or is my understanding of the
carries-a-dependency relation correct and my conclusions are merely too
pessimistic?


Ciao,
Michael.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/