Re: [RFC PATCH] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section

From: Arnd Bergmann
Date: Tue Feb 12 2019 - 08:03:26 EST

On Mon, Feb 11, 2019 at 6:29 PM Will Deacon <will.deacon@xxxxxxx> wrote:

> + __iomem pointers obtained with non-default attributes (e.g. those returned
> + by ioremap_wc()) are unlikely to provide many of these guarantees. If
> + ordering is required for such mappings, then the mandatory barriers should
> + be used in conjunction with the _relaxed() accessors defined below.

I wonder if we are even able to guarantee consistent behavior across
in the last case here (wc mapping with relaxed accessors and barriers).

Fortunately, there are only five implementations that actually differ from
ioremap_nocache(): arm32, arm64, ppc32, ppc64 and x86 (both 32/64), so
that is something we can probably figure out between the people on Cc.

The problem with recommending *_relaxed() + barrier() is that it ends
up being more expensive than the non-relaxed accessors whenever
_relaxed() implies the barrier already (true on most architectures other
than arm).

ioremap_wc() in turn is used almost exclusively to map RAM behind
a bus, (typically for frame buffers) and we may be better off not
assuming any particular MMIO barrier semantics for it at all, but possibly
audit the few uses that are not frame buffers.

> + Since many CPU architectures ultimately access these peripherals via an
> + internal virtual memory mapping, the portable ordering guarantees provided
> + by inX() and outX() are the same as those provided by readX() and writeX()
> + respectively when accessing a mapping with the default I/O attributes.

This is notably weaker than the PCI mandated non-posted write semantics.
As I said earlier, not all architectures or PCI host implementations can provide
non-posted writes though, but maybe we can document that fact here, e.g.

| Device drivers may expect outX() to be a non-posted write, i.e. waiting for
| a completion response from the I/O device, which may not be possible
| on a particular hardware.

> (*) ioreadX(), iowriteX()
> These will perform appropriately for the type of access they're actually
> doing, be it inX()/outX() or readX()/writeX().

This probably needs clarification as well then: On architectures that
have a stronger barrier after outX() than writeX() but that use memory
mapped access for both, the statement is currently not true. We could
either strengthen the requirement by requiring CONFIG_GENERIC_IOMAP
on such architectures, or we could document the current behavior
as intentional and explicitly not allow iowriteX() on I/O ports to be posted.

> +All of these accessors assume that the underlying peripheral is little-endian,
> +and will therefore perform byte-swapping operations on big-endian architectures.

This sounds like a useful addition and the only sane way to do it IMHO, but
I think at least traditionally we've had architectures that do not work like
this but that make readX()/writeX() do native big-endian loads and stores, with
a hardware byteswap on the PCI bus.