Re: [PATCH] add delay between port write and port read

From: Maciej W. Rozycki (macro@xxxxxxxxxxxxxx)
Date: Wed Feb 27 2019 - 13:50:00 EST

On Tue, 26 Feb 2019, Will Deacon wrote:

> > If they are the same device (just different data ports), I'd
> > *definitely* expect them to be ordered.
> >
> > We have tons of code that depends on that. Almost every driver out
> > there, in fact.
> >
> > So we need the mb() on alpha to guarantee the access ordering on the
> > CPU side, and then PCI itself ends up guaranteeing that accesses to
> > the same device will remain ordered outside the CPU.
> >
> > Agreed?
> Yup, agreed. I'd consider all those ports to be the same endpoint, so we're
> good.

FAOD, I think this assumption/requirement only applies to the plain
accessors (`inX', `readX', `ioreadX', etc.).

For performance reasons we may decide sometime to opt in for accessors
that do not suffer from the requirement to be strongly ordered WRT each
other, for the benefit to architectures that are not strongly ordered with
MMIO and that suffer a lot from serialising accesses that do not really
care, e.g. where you need to load a bunch of device registers or maybe
even device RAM in any order before making a serialised final request to
accept the values loaded.

I made provisions for that with a driver I recently added with commit
61414f5ec983 ("FDDI: defza: Add support for DEC FDDIcontroller 700
TURBOchannel adapter"), where locally defined accessor macros suffixed
with `_o' and `_u' denote accesses that have to be strongly ordered and
can be weakly ordered respectively WRT each other.

Right now they all expand to the respective `_relaxed' accessors (with a
lone `dma_rmb' inserted appropriately; yes, the device does DMA one way
only, and the other one is PIO with a lot of MMIO traffic to board RAM
that would benefit from omitting barriers), however they can be replaced
with references to truly unordered accessors if we ever have them.

That piece of hardware is however rather peculiar and not an example of
the most common design seen nowadays and I am not sure if the extra
maintenance burden across all the ports for any additional accessors would
be outweighed by the benefit for the weakly ordered MMIO architectures
(where an execution stall can indeed count in hundreds of clock cycles per
barrier inserted) combined with the appreciation (i.e. actual use) level
from driver writers who do not necessarily grok all that weak ordering