RE: [patches] Re: [PATCH v9 05/12] RISC-V: Atomic and Locking Code

From: Daniel Lustig
Date: Thu Nov 16 2017 - 12:12:47 EST


> From: Will Deacon [mailto:will.deacon@xxxxxxx]
> Hi Daniel,
>
> On Thu, Nov 16, 2017 at 06:40:46AM +0000, Daniel Lustig wrote:
> > > > In that case, maybe we should just start out having a fence on
> > > > both sides for
> > >
> > > Actually, given your architecture is RCsc rather than RCpc, so I
> > > think maybe you could follow the way that ARM uses(i.e. relaxed load
> > > + release store + a full barrier). You can see the commit log of
> > > 8e86f0b409a4
> > > ("arm64: atomics: fix use of acquire + release for full barrier
> > > semantics") for the reasoning:
> > >
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > o
> > > mmit/?id=8e86f0b409a44193f1587e87b69c5dcf8f65be67
> >
> > OK I'm catching up now...and I have to say, that is clever! Thanks
> > for the corrections. It would definitely be good to avoid the double
> > fence. Let me think this over a bit more.
> >
> > One subtle point about RCpc vs. RCsc, though: see below.
> >
> > >
> > > Sounds great! Any estimation when we can see that(maybe a draft)?
> >
> > The latest should be November 28-30, coinciding with the next RISC-V
> workshop.
> > Possibly even sooner. We just recently resolved a big debate that's
> > been holding us up for a while, and so now it's just a matter of me
> > writing it all up so it's understandable.
> >
> > In the meantime, though, let me quickly and informally summarize some
> > of the highlights, in case it helps unblock any further review here:
> >
> > - The model is now (other-)multi-copy atomic
> > - .aq and .rl alone mean RCpc
> > - .aqrl means RCsc
>
> That presents you with an interesting choice when implementing locks: do
> you use .aqrl for lock and unlokc and elide smp_mb__after_spinlock, or do
> you use .aq/.rl for lock/unlock respectively and emit a fence for
> smp_mb__after_spinlock?

Good question, and I should probably do my homework before weighing in strongly
either way.

>
> > - .aq applies only to the read part of an RMW
> > - .rl applies only to the write part of an RMW
> > - Syntactic dependencies are now respected
>
> Thanks for the update, even this brief summary is better than the current
> architecture document ;)
>
> > I recognize this isn't enough detail to do it proper justice, but
> > we'll get the full model out as soon as we can.
>
> Ok, I'll bite! Do you forbid LB?

This was definitely one of the topics we debated. In the end, we chose to
allow LB but forbid LB+deps or stronger. CPUs may not produce LB all that
often, but we didn't see a super strong need to rule it out either.
Out-of-thin-air is covered now that we enforce dependencies. Also, GPUs do
produce LB, and that's relevant to our (NVIDIA's) use case at least.

I'm happy to keep answering other questions too.

Dan