Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

From: Maciej W. Rozycki
Date: Wed Jan 27 2016 - 04:57:37 EST


On Thu, 12 Nov 2015, David Daney wrote:

> > > Certainly we can load up the code with "SYNC" all over the place, but
> > > it will kill performance on SMP systems. So, my vote would be to make
> > > it as light weight as possible, but no lighter. That will mean
> > > inventing the proper barrier primitives.
> >
> > It seems to me that the proper barrier here is a "SYNC 18" aka
> > SYNC_RELEASE instruction, at least on CPUs that implement that variant.

For the record, we've had "cooked" aliases in the toolchain for a short
while now -- since Sep 2010 or binutils 2.21 -- so for readability you can
actually use `sync_release' in your source code rather than obscure `sync
18' (of course you could define a macro instead, but there's no need now),
and disassembly will show the "cooked" mnemonic too.

Although Documentation/Changes still lists binutils 2.12 as the minimum,
so perhaps using macros is indeed the way to go now, at least for the time
being.

> Yes, unfortunately very few CPUs implement that. It is an instruction that
> MIPS invented only recently, so older CPUs need a different solution.

Hmm, it looks to me we might actually be safe, although as often the
situation seems more complicated than it had to be.

Conventional wisdom says that SYNC as the ultimate ordering barrier, aka
SYNC 0, was added with the MIPS II ISA, with a provision to define less
restrictive barriers in the future in a backward compatible manner, by the
means of undefined (any non-zero at the time) barrier types defaulting to
0. Early references seem to have been lost in the mist of time, however a
few legacy MIPS ISA documents remain, e.g. the MIPS IV ISA document
says[1]:

"The stype values 1-31 are reserved; they produce the same result as the
value zero."

making it clear that non-zero arguments will work as expected, albeit
perhaps with a somewhat heavyweight effect. But there's sometimes no
other way.

This seems more ambiguous with earlier documentation available, e.g. the
MIPS R4000 processor manual, which omits the mention of `stype' altogether
and merely defines a single SYNC instruction encoding with all-zeros
across bits 25:6 of the instruction word, among which `stype' normally
lives[2]. This appears the same with other MIPS III processor
documentation (e.g. IDT 79RV4700[3]). However I'm fairly sure all these
simply did not bother decoding SYNC beyond the major and minor opcode, so
again SYNC 0 semantics should be held across the more recently defined
variants. I could this actually sometime with an R4000 class processor.

Modern MIPS architecture specifications started with the same definition
as the MIPS IV ISA had, rev. 0.95 documents still stated[4][5]:

"The stype values 1-31 are reserved; they produce the same result as the
value zero."

Unfortunately the requirement got weakened later on, rev. 1.00
architecture specifications now stated[6][7]:

"The stype values 1-31 are reserved for future extensions to the
architecture. A value of zero will always be defined such that it
performs all defined synchronization operations. Non-zero values may be
defined to remove some synchronization operations. As such, software
should never use a non-zero value of the stype field, as this may
inadvertently cause future failures if non-zero values remove
synchronization operations."

I think the intent was not to break backwards compatibility, and certainly
anyone who looked at one of the earlier documents might have realised that
implementing non-zero SYNC operations, that do not have a vendor-specific
semantics, as aliases to SYNC 0 rather than NOP or RI triggers would be a
good idea. However implementers may not have been able to infer that from
reading the lone current revision of architecture documents.

It was only with rev. 2.60 of architecture specifications that along new
SYNC operations the requirement for undefined SYNC operations to behave as
SYNC 0 was put in the text back in an unambiguous form[8][9]:

"A stype value of zero will always be defined such that it performs the
most complete set of synchronization operations that are defined. This
means stype zero always does a completion barrier that affects both loads
and stores preceding the SYNC instruction and both loads and stores that
are subsequent to the SYNC instruction. Non-zero values of stype may be
defined by the architecture or specific implementations to perform
synchronization behaviors that are less complete than that of stype zero.
If an implementation does not use one of these non-zero values to define a
different synchronization behavior, then that non-zero value of stype must
act the same as stype zero completion barrier. This allows software
written for an implementation with a lighter-weight barrier to work on
another implementation which only implements the stype zero completion
barrier."

This definition has then been retained in the architecture specification
throughout now.

Overall I think it should be safe after all to use SYNC_RELEASE and other
modern lightweight barriers uncondtionally under the assumption that
architecture was meant to remain backward compatible. Even though it
might be possible someone would implement unusual semantics for the then
undefined `stype' values, I highly doubt it as it would be extra effort
and hardware logic space for no gain. We could try and reach architecture
overseers to double-check whether the `stype' encodings, somewhat
irregularly distributed, were indeed defined in a manner so as not to
clash with values implementers chose to use before rev. 2.61 of the
architecture specification.

Then, for performance reasons, if there were indeed any pre-2.61
implementations which define vendor-specific lightweight barriers, then we
could replace the standard encoding embedded in the kernel binary, by
run-time patching the image up at bootstrap, based on the processor type
identified in cpu-probe.c. Likewise, for implementations that are weakly
enough ordered to define SYNC as an actual barrier rather than a different
encoding of NOP (e.g. the NEC VR4100 is strongly ordered and implements
SYNC as a NOP[10]), yet strongly enough ordered for some of the other
barriers not to be necessary, the respective barriers could be patched up
with NOPs.

For I/O ordering and completion barriers, mentioned earlier in the
thread, on the MIPS target we need a different set of primitives, as some
early incarnations of the architecture were weakly ordered in this respect
in a somewhat unusual way, at least to some. Only reads were strongly
ordered in all cases. However writes could bypass each other, could be
merged, or could be removed altogether (preempted with a later one).
Then reads could bypass writes or read back a pending write. None of this
matters for true memory, however it certainly does for I/O, where side
effects exist or timely completion is required.

I have previously outlined what needs to be implemented in this area, as
recorded here:
<http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=alpine.LFD.2.11.1404280048540.11598%40eddie.linux-mips.org>,
to unify the uncoordinated platform attempts made so far. I still have it
on my to-do list, hopefully to get at soon.

References:

[1] "MIPS IV Instruction Set", MIPS Technologies, Inc., Revision 3.2, By
Charles Price, September, 1995, p. A-161
<http://techpubs.sgi.com/library/manuals/2000/007-2597-001/pdf/007-2597-001.pdf>

[2] Joe Heinrich: "MIPS R4000 Microprocessor User's Manual", Second
Edition, MIPS Technologies, Inc., April 1, 1994, p. A-161
<http://techpubs.sgi.com/library/manuals/2000/007-2489-001/pdf/007-2489-001.pdf>

[3] "IDT79RV4700 RISC Processor Hardware User's Manual", Integrated
Device Technology, Inc., Version 2.1, December 1997, p. A-130

[4] "MIPS32 Architecture For Programmers, Volume II: The MIPS32
Instruction Set", MIPS Technologies, Inc., Document Number: MD00086,
Revision 0.95, March 12, 2001, p. 215

[5] "MIPS64 Architecture For Programmers, Volume II: The MIPS64
Instruction Set", MIPS Technologies, Inc., Document Number: MD00087,
Revision 0.95, March 12, 2001, p. 300

[6] "MIPS32 Architecture For Programmers, Volume II: The MIPS32
Instruction Set", MIPS Technologies, Inc., Document Number: MD00086,
Revision 1.00, August 29, 2002, p. 209

[7] "MIPS64 Architecture For Programmers, Volume II: The MIPS64
Instruction Set", MIPS Technologies, Inc., Document Number: MD00087,
Revision 1.00, August 29, 2002, p. 295

[8] "MIPS32 Architecture For Programmers, Volume II: The MIPS32
Instruction Set", MIPS Technologies, Inc., Document Number: MD00086,
Revision 2.60, June 25, 2008, p. 250

[9] "MIPS64 Architecture For Programmers, Volume II: The MIPS64
Instruction Set", MIPS Technologies, Inc., Document Number: MD00087,
Revision 2.60, June 25, 2008, p. 317

[10] "VR4100 64-BIT MICROPROCESSOR USER'S MANUAL (PRELIMINARY)", NEC
Corporation, Document No. U10050EJ3V0UM00 (3rd edition), January
1996, p. 413

Maciej