Re: [PATCH] add delay between port write and port read

From: Will Deacon (will.deacon@xxxxxxx)
Date: Fri Mar 01 2019 - 12:33:52 EST

Hi Maciej,

On Wed, Feb 27, 2019 at 06:49:57PM +0000, Maciej W. Rozycki wrote:
> On Tue, 26 Feb 2019, Will Deacon wrote:
> > > If they are the same device (just different data ports), I'd
> > > *definitely* expect them to be ordered.
> > >
> > > We have tons of code that depends on that. Almost every driver out
> > > there, in fact.
> > >
> > > So we need the mb() on alpha to guarantee the access ordering on the
> > > CPU side, and then PCI itself ends up guaranteeing that accesses to
> > > the same device will remain ordered outside the CPU.
> > >
> > > Agreed?
> >
> > Yup, agreed. I'd consider all those ports to be the same endpoint, so we're
> > good.
> FAOD, I think this assumption/requirement only applies to the plain
> accessors (`inX', `readX', `ioreadX', etc.).

It's also a requirement for the *_relaxed accessors, and there are drivers
that rely on this being the case.

> For performance reasons we may decide sometime to opt in for accessors
> that do not suffer from the requirement to be strongly ordered WRT each
> other, for the benefit to architectures that are not strongly ordered with
> MMIO and that suffer a lot from serialising accesses that do not really
> care, e.g. where you need to load a bunch of device registers or maybe
> even device RAM in any order before making a serialised final request to
> accept the values loaded.

I'd expect accesses to device RAM to use something like ioremap_wc() if
possible. In that case, the ordering of accesses is weakened by the
underlying memory type in the page tables, but we're not yet at the point
where we've figured out the portable semantics in this case. I plan to
look at that once we've nailed normal ioremap()!

> I made provisions for that with a driver I recently added with commit
> 61414f5ec983 ("FDDI: defza: Add support for DEC FDDIcontroller 700
> TURBOchannel adapter"), where locally defined accessor macros suffixed
> with `_o' and `_u' denote accesses that have to be strongly ordered and
> can be weakly ordered respectively WRT each other.
> Right now they all expand to the respective `_relaxed' accessors (with a
> lone `dma_rmb' inserted appropriately; yes, the device does DMA one way
> only, and the other one is PIO with a lot of MMIO traffic to board RAM
> that would benefit from omitting barriers), however they can be replaced
> with references to truly unordered accessors if we ever have them.
> That piece of hardware is however rather peculiar and not an example of
> the most common design seen nowadays and I am not sure if the extra
> maintenance burden across all the ports for any additional accessors would
> be outweighed by the benefit for the weakly ordered MMIO architectures
> (where an execution stall can indeed count in hundreds of clock cycles per
> barrier inserted) combined with the appreciation (i.e. actual use) level
> from driver writers who do not necessarily grok all that weak ordering
> business.

If you use ioremap_wc() for device RAM and __raw_readX() for bunching up
register accesses, then I don't think we need to add anything new, do we?