Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

From: Will Deacon
Date: Tue Feb 02 2016 - 12:51:33 EST


On Tue, Feb 02, 2016 at 09:30:26AM -0800, Linus Torvalds wrote:
> On Tue, Feb 2, 2016 at 1:34 AM, Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> >
> > Just to be clear, what Will, Paul and I are discussing here is about
> > local transitivity,
>
> I really don't think that changes the picture.

For the general point about mixed methods, perhaps not, but it does
mean that we can't describe all of the issues using fewer than three
processors.

> Given that
>
> (a) we already mix ordering methods and there are good reasons for
> it, and I'd expect transitivity only makes that more likely
>
> (b) we expect transitivity from the individual ordering methods
>
> (c) I don't think that there are any relevant CPU's that violate this anyway
>
> I really think that not expecting that to hold for mixed accesses
> would be a complete disaster. It will confuse the hell out of people.
>
> And the basic argument really stands: we should make the memory
> ordering expectations as strong as we can, given the existing relevant
> architecture constraints (ie x86/arm/power).
>
> If that then means that some other architecture might need to add
> extra serialization that that architecture doesn't _want_ to add,
> tough luck. I absolutely hate the fact that alpha forced us to add
> that crazy read-depends barrier, and I want to discourage that a lot.
>
> In fact, I'd be willing to strengthen our existing orderings just in
> the name of sanity, and say that "rcu_dereference()" should just be an
> acquire, and say that if the architecture makes that more expensive,
> then who the hell cares? I have not been very happy with the "consume"
> memory ordering discussions for C++. Yes, it would hurt pre-lwsync
> power a bit, and it would hurt 32-bit arm, but enough that we should
> have the headache of the existing semantics?

Given that the vast majority of weakly ordered architectures respect
address dependencies, I would expect all of them to be hurt if they
were forced to use barrier instructions instead, even those where the
microarchitecture is fairly strongly ordered in practice.

Even load-acquire on ARMv8 has more work to do than a plain old address
dependency, so I'd be sad to see us upgrading rcu_dereference like this,
particularly when its a relatively uncontentious, easy to understand
part of the kernel memory model.

As far as I understand it, the problems with "consume" have centred
largely around compiler and specification issues, which we don't have
with rcu_dereference (i.e. we ignore thin-air and use volatile casts
/barrier() to keep the optimizer at bay).

Will