Re: [PATCH 00/18] Cross-architecture definitions of relaxed MMIO accessors

From: Peter Zijlstra
Date: Thu Apr 17 2014 - 10:01:35 EST


On Thu, Apr 17, 2014 at 02:44:03PM +0100, Will Deacon wrote:
> Hello,
>
> This RFC series attempts to define a portable (i.e. cross-architecture)
> definition of the {readX,writeX}_relaxed MMIO accessor functions. These
> functions are already in widespread use amongst drivers (mainly those supporting
> devices embedded in ARM SoCs), but lack any well-defined semantics and,
> subsequently, any portable definitions to allow these drivers to be compiled for
> other architectures.
>
> The two main motivations for this series are:
>
> (1) To promote use of the _relaxed MMIO accessors on weakly-ordered
> architectures, where they can bring significant performance improvements
> over their non-relaxed counterparts.
>
> (2) To allow COMPILE_TEST to build drivers using the relaxed accessors across
> all architectures.
>
> The proposed semantics largely match exactly those provided by the ARM
> implementation (i.e. no weaker), with one exception (see below).
>
> Informally:
>
> - Relaxed accesses to the same device are ordered with respect to each other.
>
> - Relaxed accesses are *not* guaranteed to be ordered with respect to normal
> memory accesses (e.g. DMA buffers -- this is what gives us the performance
> boost over the non-relaxed versions).
>
> - Relaxed accesses are not guaranteed to be ordered with respect to
> LOCK/UNLOCK operations.
>
> In actual fact, the relaxed accessors *are* ordered with respect to LOCK/UNLOCK
> operations on ARM[64], but I have added this constraint for the benefit of
> PowerPC, which has expensive I/O barriers in the spin_unlock path for the
> non-relaxed accessors.
>
> A corollary to this is that mmiowb() probably needs rethinking. As it currently
> stands, an mmiowb() is required to order MMIO writes to a device from multiple
> CPUs, even if that device is protected by a lock. However, this isn't often used
> in practice, leading to PowerPC implementing both mmiowb() *and* synchronising
> I/O in spin_unlock.
>
> I would propose making the non-relaxed I/O accessors ordered with respect to
> LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if
> required, but would welcome thoughts/suggestions on this topic.

So the non-relaxed ops already imply the expensive I/O barrier (mmiowb?)
and therefore, PPC can drop it from spin_unlock()?

Also, I read mmiowb() as MMIO-write-barrier(), what do we have to
order/contain mmio-reads?

I have _0_ experience with MMIO, so I've no idea if ordering/containing
reads is silly or not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/