Re: [RFC][PATCH 0/5] arch: atomic rework

From: Peter Sewell
Date: Wed Mar 05 2014 - 13:37:38 EST

Next message: Markus Elfring: "[PATCH 5/5] Addition of a make file for build automation"
Previous message: Marcel Holtmann: "Re: [PATCH] Revert "Bluetooth: Use devname:vhci module alias for virtual HCI driver""
In reply to: Torvald Riegel: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Next in thread: Torvald Riegel: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 5 March 2014 17:15, Torvald Riegel <triegel@xxxxxxxxxx> wrote:
> On Tue, 2014-03-04 at 22:11 +0000, Peter Sewell wrote:
>> On 3 March 2014 20:44, Torvald Riegel <triegel@xxxxxxxxxx> wrote:
>> > On Sun, 2014-03-02 at 04:05 -0600, Peter Sewell wrote:
>> >> On 1 March 2014 08:03, Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> >> > On Sat, Mar 01, 2014 at 04:06:34AM -0600, Peter Sewell wrote:
>> >> >> Hi Paul,
>> >> >>
>> >> >> On 28 February 2014 18:50, Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> >> >> > On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
>> >> >> >> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
>> >> >> >> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
>> >> >> >> > <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> >> >> >> > >
>> >> >> >> > > 3. The comparison was against another RCU-protected pointer,
>> >> >> >> > > where that other pointer was properly fetched using one
>> >> >> >> > > of the RCU primitives. Here it doesn't matter which pointer
>> >> >> >> > > you use. At least as long as the rcu_assign_pointer() for
>> >> >> >> > > that other pointer happened after the last update to the
>> >> >> >> > > pointed-to structure.
>> >> >> >> > >
>> >> >> >> > > I am a bit nervous about #3. Any thoughts on it?
>> >> >> >> >
>> >> >> >> > I think that it might be worth pointing out as an example, and saying
>> >> >> >> > that code like
>> >> >> >> >
>> >> >> >> > p = atomic_read(consume);
>> >> >> >> > X;
>> >> >> >> > q = atomic_read(consume);
>> >> >> >> > Y;
>> >> >> >> > if (p == q)
>> >> >> >> > data = p->val;
>> >> >> >> >
>> >> >> >> > then the access of "p->val" is constrained to be data-dependent on
>> >> >> >> > *either* p or q, but you can't really tell which, since the compiler
>> >> >> >> > can decide that the values are interchangeable.
>> >> >> >> >
>> >> >> >> > I cannot for the life of me come up with a situation where this would
>> >> >> >> > matter, though. If "X" contains a fence, then that fence will be a
>> >> >> >> > stronger ordering than anything the consume through "p" would
>> >> >> >> > guarantee anyway. And if "X" does *not* contain a fence, then the
>> >> >> >> > atomic reads of p and q are unordered *anyway*, so then whether the
>> >> >> >> > ordering to the access through "p" is through p or q is kind of
>> >> >> >> > irrelevant. No?
>> >> >> >>
>> >> >> >> I can make a contrived litmus test for it, but you are right, the only
>> >> >> >> time you can see it happen is when X has no barriers, in which case
>> >> >> >> you don't have any ordering anyway -- both the compiler and the CPU can
>> >> >> >> reorder the loads into p and q, and the read from p->val can, as you say,
>> >> >> >> come from either pointer.
>> >> >> >>
>> >> >> >> For whatever it is worth, hear is the litmus test:
>> >> >> >>
>> >> >> >> T1: p = kmalloc(...);
>> >> >> >> if (p == NULL)
>> >> >> >> deal_with_it();
>> >> >> >> p->a = 42; /* Each field in its own cache line. */
>> >> >> >> p->b = 43;
>> >> >> >> p->c = 44;
>> >> >> >> atomic_store_explicit(&gp1, p, memory_order_release);
>> >> >> >> p->b = 143;
>> >> >> >> p->c = 144;
>> >> >> >> atomic_store_explicit(&gp2, p, memory_order_release);
>> >> >> >>
>> >> >> >> T2: p = atomic_load_explicit(&gp2, memory_order_consume);
>> >> >> >> r1 = p->b; /* Guaranteed to get 143. */
>> >> >> >> q = atomic_load_explicit(&gp1, memory_order_consume);
>> >> >> >> if (p == q) {
>> >> >> >> /* The compiler decides that q->c is same as p->c. */
>> >> >> >> r2 = p->c; /* Could get 44 on weakly order system. */
>> >> >> >> }
>> >> >> >>
>> >> >> >> The loads from gp1 and gp2 are, as you say, unordered, so you get what
>> >> >> >> you get.
>> >> >> >>
>> >> >> >> And publishing a structure via one RCU-protected pointer, updating it,
>> >> >> >> then publishing it via another pointer seems to me to be asking for
>> >> >> >> trouble anyway. If you really want to do something like that and still
>> >> >> >> see consistency across all the fields in the structure, please put a lock
>> >> >> >> in the structure and use it to guard updates and accesses to those fields.
>> >> >> >
>> >> >> > And here is a patch documenting the restrictions for the current Linux
>> >> >> > kernel. The rules change a bit due to rcu_dereference() acting a bit
>> >> >> > differently than atomic_load_explicit(&p, memory_order_consume).
>> >> >> >
>> >> >> > Thoughts?
>> >> >>
>> >> >> That might serve as informal documentation for linux kernel
>> >> >> programmers about the bounds on the optimisations that you expect
>> >> >> compilers to do for common-case RCU code - and I guess that's what you
>> >> >> intend it to be for. But I don't see how one can make it precise
>> >> >> enough to serve as a language definition, so that compiler people
>> >> >> could confidently say "yes, we respect that", which I guess is what
>> >> >> you really need. As a useful criterion, we should aim for something
>> >> >> precise enough that in a verified-compiler context you can
>> >> >> mathematically prove that the compiler will satisfy it (even though
>> >> >> that won't happen anytime soon for GCC), and that analysis tool
>> >> >> authors can actually know what they're working with. All this stuff
>> >> >> about "you should avoid cancellation", and "avoid masking with just a
>> >> >> small number of bits" is just too vague.
>> >> >
>> >> > Understood, and yes, this is intended to document current compiler
>> >> > behavior for the Linux kernel community. It would not make sense to show
>> >> > it to the C11 or C++11 communities, except perhaps as an informational
>> >> > piece on current practice.
>> >> >
>> >> >> The basic problem is that the compiler may be doing sophisticated
>> >> >> reasoning with a bunch of non-local knowledge that it's deduced from
>> >> >> the code, neither of which are well-understood, and here we have to
>> >> >> identify some envelope, expressive enough for RCU idioms, in which
>> >> >> that reasoning doesn't allow data/address dependencies to be removed
>> >> >> (and hence the hardware guarantee about them will be maintained at the
>> >> >> source level).
>> >> >>
>> >> >> The C11 syntactic notion of dependency, whatever its faults, was at
>> >> >> least precise, could be reasoned about locally (just looking at the
>> >> >> syntactic code in question), and did do that. The fact that current
>> >> >> compilers do optimisations that remove dependencies and will likely
>> >> >> have many bugs at present is besides the point - this was surely
>> >> >> intended as a *new* constraint on what they are allowed to do. The
>> >> >> interesting question is really whether the compiler writers think that
>> >> >> they *could* implement it in a reasonable way - I'd like to hear
>> >> >> Torvald and his colleagues' opinion on that.
>> >> >>
>> >> >> What you're doing above seems to be basically a very cut-down version
>> >> >> of that, but with a fuzzy boundary. If you want it to be precise,
>> >> >> maybe it needs to be much simpler (which might force you into ruling
>> >> >> out some current code idioms).
>> >> >
>> >> > I hope that Torvald Riegel's proposal (https://lkml.org/lkml/2014/2/27/806)
>> >> > can be developed to serve this purpose.
>> >>
>> >> (I missed that mail when it first came past, sorry)
>> >>
>> >> That's also going to be tricky, I'm afraid. The key condition there is:
>> >>
>> >> "* at the time of execution of E, L [PS: I assume that L is a
>> >> typo and should be E]
>> >
>> > No, L was intended. (But see below.)
>>
>> then I misunderstood...
>>
>> >> can possibly have returned at
>> >> least two different values under the assumption that L itself
>> >> could have returned any value allowed by L's type."
>> >>
>> >> First, the evaluation of E might be nondeterministic - e.g., for an
>> >> artificial example, if it's just a nondeterministic value obtained
>> >> from the result of a race on SC atomics. The above doesn't
>> >> distinguish between that (which doesn't have a real dependency on L)
>> >> and that XOR'd with L (which does). And it does so in the wrong
>> >> direction: it'll say there the former has a dependency on L.
>> >
>> > I'm not quite sure I understand the examples you want to point out
>> > (could you add brief code snippets, perhaps?) -- but the informal
>> > definition I proposed also says that E must have used L in some way to
>> > make it's computation. That's pretty vague, and the way I phrased the
>> > requirement is probably not optimal.
>>
>> ...so this example isn't so relevant, but it's maybe interesting
>> anyway. In free-wheeling psuedocode:
>>
>> x = read_consume(L) // a memory_order_consume read of a 1-bit value from L
>> y = nondet() // some code that
>> nondeterministically produces another 1-bit value, eg by spawning two
>> threads that each do an SC-atomic write to some other location,
>> returning 0 or 1 depending on which wins
>>
>> then evaluate either just
>>
>> z=y
>>
>> or
>>
>> z = x XOR y
>>
>> In my misinterpretation of what you wrote, your definition would say
>> there's a dependency from the load of L to the evalution of y, even
>> though there isn't.
>
> The evaluations "z==y" or "z=y" wouldn't depend on L, assuming nondet()
> doesn't depend on L (e.g., by using x). They don't use x's value. (And
> I don't have a precise definition for this, unfortunately, but I hope
> the intent is clear.)
>
> "x XOR y" would depend on L, because this does take x into account, and
> x's value remains relevant irrespective of which value y has. y would
> be non-vdp, and the compiler could have specialized the code into two
> branches for both 1 and 0 values of y; but it would still need x to
> compute z in the last evaluation.
>
>>
>> > So let me expand a bit on the background first. What I tried to capture
>> > with the rules is that an evaluation (ie, E), really uses the value
>> > returned by L, and not just a, for example, constant value that can be
>> > inferred from whatever was executed between L and E. This (is intended
>> > to) prevent value prediction and such by programmers. The compiler in
>> > turn knows what a real program is allowed to do that still wants to rely
>> > on the ordering by the consume. Basing all of this on the values seemed
>> > to be helpful because basing it on *purely* syntax didn't seem
>> > implementable; for the latter, we'd need to disallow cases such as x-x
>> > or require compilers to include artificial dependencies if a program
>> > indeed does value speculation. IOW, it didn't seem possible to draw a
>> > clear distinction between a data dependency defined purely based on
>> > syntax and control dependencies.
>> >
>> > Another way to perhaps phrase the requirement might be to construct it
>> > inductively, so that "E uses the value returned by L" becomes clearer.
>> > However, I currently don't really know how to do that without any holes.
>> >
>> > For example, let's say that an atomic mo_consume load has a value
>> > dependency to itself; the dependency has an associated set of values,
>> > which is equal to all values allowed by the type (but see also below).
>> > Then, an evaluation A that uses B as operand has a value dependency on B
>> > if the associated set of values can still have more than one element.
>> > Whether that's the case depends on the operation, and the other
>> > operands, including their values at the time of execution.
>> > However, it's not just results of evaluations that effectively constrain
>> > the set of values. Conditional execution does as well, because the
>> > condition being true might establish a constraint on L, which might
>> > remove any dependency when L (or a result of a computation involving L)
>> > is used. So I'm not sure the inductive approach would work.
>> >
>> > You said you thought I might have wanted to say that:
>> > "E really-depends on L if there exist two different values that (just
>> > according to typing) might be read for L that give rise to two
>> > different values for E".
>> >
>> > Which should follow from the above, or at least should not conflict with
>> > it. My "definition" tried to capture that the program must not
>> > establish so many constraints between E and L that the value of L is
>> > clear in the sense of having exactly one value. If we dereference such
>> > an E whose execution in fact constrains L to one value, there is no real
>> > dependency anymore.
>> > However, by itself, this doesn't cover that E must use L in it's
>> > computation -- which your formulation does, I think.
>>
>> unfortunately not - that's what the example above shows.
>
> Now I think I understand.
>
>> > Another way might be to try to define which constraints established by
>> > an execution (based on the executed code's semantics) actually remove
>> > value dependencies on L.
>> >
>> > Do you have any suggestions for how to define this (ie, true value
>> > dependencies) in a better way?
>>
>> not at the moment. I need to think some more about your vdps, though.
>> As I understand it, you're agreeing with the general intent of the
>> original C11 design that some new mechanism to tell the compiler not
>> to optimise in certain places in required, but differing in the way
>> you identify those?
>
> Yes, we still need such a mechanism because we need to prevent the
> compiler from doing value prediction or something similar that uses
> control dependencies to avoid data dependencies. For example, it could
> use a large switch statement to add code specialized for each possible
> value returned from an mo_consume load. Thus, at least for the
> standard, we need some mechanism that prevents certain compiler
> optimizations.
>
> Where it differs to C11 is that we're not trying to use just such a
> mechanism built on syntax to try to define when we actually have a real
> value dependency.
>
>>
>> >> Second, it involves reasoning about counterfactual executions. That
>> >> doesn't necessarily make it wrong, per se, but probably makes it hard
>> >> to work with. For example, suppose that in all the actual
>> >> whole-program executions, a runtime occurrence of L only ever returns
>> >> one particular value (perhaps because of some simple #define'd
>> >> configuration)
>> >
>> > Right, or just because it's a deterministic program.
>> >
>> > This is a problem for the definition of value dependencies, AFAICT,
>> > because where do you put the line for what the compiler is allowed to
>> > speculate about? When does the program indeed "reveal" the value of a
>> > load, and when does it not? Is a compiler to do out-of-band tracking
>> > for certain memory locations, and thus predict the value?
>> >
>> > Adding the assumption that L, at least conceptually, might return any
>> > value seemed to be a simple way to remove that uncertainty regarding
>> > what the compiler might know about.
>>
>> ...but it brings in a lot of baggage.
>
> Which baggage do you mean? In terms of verification, or definitions in
> the standard, or effects on compilers?

At least the first two. I'm not in a position to comment on the last,
but I'd hate to try to do compiler verification based on a semantics
that involves substantial quantification over counterfactual
hypothetical executions.

>> >> , and that the code used in the evaluation of E depends
>> >> on some invariant which is related to that configuration.
>> >
>> > A related issue that I've been thinking about is whether things like
>> > divide-by-zero (ie, operations that give rise to undefined behavior)
>> > should be considered as constraints on the values or not. I guess they
>> > should, but then this seems closer what you mention, invariants
>> > established by the whole program.
>> >
>> >> The
>> >> hypothetical execution used above in which a different value is used
>> >> is one in the code is being run in a situation with broken invariants.
>> >> Then there will be technical difficulties in using the definition:
>> >> I don't see how one would persuade oneself that a compiler always
>> >> satisfies it, because these hypothetical executions are far removed
>> >> from what it's actually working on.
>> >
>> > I think the above is easy to implement for a compiler. At the
>> > mo_consume load, you simply forget any information you have about this
>> > value / memory location; for operations on value_dep_preserving types,
>>
>> are all the subexpressions likewise forced to be of vdp types?
>
> I'd like to avoid that if possible. Things like "vdp-pointer + int"
> should Just Work in the sense of being still vdp-typed without requiring
> an explicit cast for the int operand. Operators ||, &&, ?: might
> justify more fine-grained rules, but I guess it's better to just make
> more stuff vdp if that's easier. vdp is not a sufficient condition for
> there being a value-dependency anyway, and the few optimizations it
> prevents on expressions containing some vdp-typed operands probably
> don't matter much. Also, as I mentioned in the reply to Paul, I believe
> that if the compiler can prove that there's no value-dependency, it can
> ignore the vdp-annotation (IOW, as-if is still possible).
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Markus Elfring: "[PATCH 5/5] Addition of a make file for build automation"
Previous message: Marcel Holtmann: "Re: [PATCH] Revert "Bluetooth: Use devname:vhci module alias for virtual HCI driver""
In reply to: Torvald Riegel: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Next in thread: Torvald Riegel: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]