Re: [PATCH 4/7] RISC-V: arch/riscv/include
From: Benjamin Herrenschmidt
Date: Tue May 23 2017 - 17:24:23 EST
On Tue, 2017-05-23 at 14:55 +0200, Arnd Bergmann wrote:
> > +
> > +#include <asm-generic/io.h>
>
> I would recommend providing your own {read,write}{b,w,l,q}{,_relaxed}
> helpers using inline assembly, to prevent the compiler for breaking
> up accesses into byte accesses.
>
> Also, most architectures require to some synchronization after a
> non-relaxed readl() to prevent prefetching of DMA buffers, and
> before a writel() to flush write buffers when a DMA gets triggered.
Right, I was about to comment on that one.
The question Palmer is about the ordering semantics of non-cached
storage.
What kind of ordering is provided architecturally ? Especially
between cachable and non-cachable loads and stores ?
Also, you have PCIe right ? What is the behaviour of MSIs ?
Does your HW provide a guarantee that in the case of a series of DMA
writes to memory by a device followed by an MSI, the CPU getting the
MSI will only get it after all the previous DMA writes have reached
coherency ? (Unlike LSIs where the driver is required to do an MMIO
read from the device, MSIs are expected to be ordered with data).
Another things with the read*() accessors. It's not uncommon for
a driver to do:
writel(1, reset_reg);
readl(reset_reg); /* flush posted writes */
udelay(10);
writel(0, reset_reg);
Now, in the above case, what can typically happen if you aren't careful
is that the readl which is intended to "push" the previous writel, will
not actually do its job because the return value hasn't been "consumed"
by the processor. Thus, the CPU will stick that on some kind of load
queue and won't actually wait for the return value before hitting the
delay loop.
Thus you might end up in a situation where the writel of 1 to the
device is itself reaching the device way after you started the delay
loop, and thus end up violating the delay requirement of the HW.
On powerpc we solve that by using a special instruction construct
inside the read* accessors that prevents the CPU from executing
subsequent instructions until the read value has been returned.
You may want to consider something similar.
Cheers,
Ben.