RE: [PATCH v2 rdma-next] RDMA/mlx5: Enable Relaxed Ordering by default for kernel ULPs
From: David Laight
Date: Wed Jun 09 2021 - 11:48:41 EST
From: Jason Gunthorpe
> Sent: 09 June 2021 16:09
>
> On Wed, Jun 09, 2021 at 03:05:52PM +0000, David Laight wrote:
>
> > In principle some writel() could generate PCIe write TLP (going
> > to the target) that have the 'relaxed ordering' bit set.
>
> In Linux we call this writel_relaxed(), though I know of no
> implementation that sets the RO bit in the TLP based on this, it would
> be semantically correct to do so.
>
> writel() has strong order requirements and must not generate a RO TLP.
Somewhere I'd forgotten about that :-(
It usually just allows the compiler and cpu hardware re-sequence
the bus cycles.
OTOH I doubt any/many PCIe targets have 'memory' areas that would
benefit from RO write TLP.
Especially since everything is organised to use target issued buffer
copies.
I'm guessing that the benefits from RO are when the writes hit memory
that is on a NUMA node or 'differently cached'.
So writes to once cache line can proceed while earlier writes are
still waiting for the cache-coherency protocol.
>From what I've seen writel() aren't too bad - they are async.
The real problem is readl().
The x86 cpu I have use a separate TLP id (I've forgotten the correct
term) for each cpu core.
So while multiple cpu can (and do) issue concurrent reads, reads from
a single cpu happen one TLP at a time - even though it would be legitimate
for the out-of-order execution unit to issue additional read TLP.
There are times when you really do have to do PIO buffer reads :-(
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)