Re: [PATCH] add delay between port write and port read

From: Maciej W. Rozycki (macro@xxxxxxxxxxxxxx)
Date: Fri Mar 01 2019 - 13:03:21 EST

Hi Will,

> > FAOD, I think this assumption/requirement only applies to the plain
> > accessors (`inX', `readX', `ioreadX', etc.).
> It's also a requirement for the *_relaxed accessors, and there are drivers
> that rely on this being the case.

Well, from the reading of memory-barriers.txt, be it as it stands, or
with your rewording applied, I take it the `*_relaxed' accessors do not
guarantee ordering WRT locking or DMA and hence a trailing barrier in
`readX_relaxed' is not necessary, and so we don't need a trailing `mb'
with the Alpha port either (but we do need a leading `mb' there, as well
as with `writeX_relaxed').

> > For performance reasons we may decide sometime to opt in for accessors
> > that do not suffer from the requirement to be strongly ordered WRT each
> > other, for the benefit to architectures that are not strongly ordered with
> > MMIO and that suffer a lot from serialising accesses that do not really
> > care, e.g. where you need to load a bunch of device registers or maybe
> > even device RAM in any order before making a serialised final request to
> > accept the values loaded.
> I'd expect accesses to device RAM to use something like ioremap_wc() if
> possible. In that case, the ordering of accesses is weakened by the
> underlying memory type in the page tables, but we're not yet at the point
> where we've figured out the portable semantics in this case. I plan to
> look at that once we've nailed normal ioremap()!

Some CPU hardware and certainly many if not most MIPS implementations do
not give such a finegrained control over mapping. I think all the MIPS
CPUs I dealt with only gave the choice between a cached (coherent or not)
and an uncached mapping, with the ordering being weak in all cases. So
the only control over ordering was with explicit barrier instructions.

> > That piece of hardware is however rather peculiar and not an example of
> > the most common design seen nowadays and I am not sure if the extra
> > maintenance burden across all the ports for any additional accessors would
> > be outweighed by the benefit for the weakly ordered MMIO architectures
> > (where an execution stall can indeed count in hundreds of clock cycles per
> > barrier inserted) combined with the appreciation (i.e. actual use) level
> > from driver writers who do not necessarily grok all that weak ordering
> > business.
> If you use ioremap_wc() for device RAM and __raw_readX() for bunching up
> register accesses, then I don't think we need to add anything new, do we?

Well, `__raw_*' accessors are never byte-swapped, not at least with the
MIPS port, making them a tad cumbersome for a driver that has no interest
in paying attention to any endianness mismatch between the CPU bus and the
device's peripheral bus.

Granted this does not matter for this driver as it only has a chance to
be used with DECstation (MIPS), DEC 3000 AXP (Alpha) and VAXstation 4000
(VAX) hardware, all of which are little-endian throughout (and not all of
which we have support for upstream). Still I'd consider it a hack as
obviously we do have drivers for options living on say PCI, which is
little-endian, that we want to use with processors configured for the big
endianness with their FSB.