Re: [PATCH v5 3/5] drivers/soc/litex: add LiteX SoC Controller driver
From: Gabriel L. Somlo
Date: Wed Apr 29 2020 - 07:32:21 EST
Hi Ben,
On Wed, Apr 29, 2020 at 01:21:11PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2020-04-27 at 11:13 +0200, Mateusz Holenko wrote:
> > As Gabriel Somlo <gsomlo@xxxxxxxxx> suggested to me, I could still use
> > readl/writel/ioread/iowrite() standard functions providing memory
> > barriers *and* have values in CPU native endianness by using the
> > following constructs:
> >
> > `le32_to_cpu(readl(addr))`
> >
> > and
> >
> > `writel(cpu_to_le32(value), addr)`
> >
> > as le32_to_cpu/cpu_to_le32():
> > - does nothing on LE CPUs and
> > - reorders bytes on BE CPUs which in turn reverts swapping made by
> > readl() resulting in returning the original value.
>
> It's a bit sad... I don't understand why you need this. The HW has a
> fied endian has you have mentioned earlier (and that is a good design).
>
> The fact that you are trying to shove things into a "smaller pipe" than
> the actual register shouldn't affect at what address the MSB and LSB
> reside. And readl/writel (or ioread32/iowrite32) will always be LE as
> well, so will match the HW layout. Thus I don't see why you need to
> play swapping games here.
>
> This however would be avoided completely if the HW was a tiny bit
> smarter and would do the multi-beat access for you which shouldn't be
> terribly hard to implement.
>
> That said, it would be even clearer if you just open coded the 2 or 3
> useful cases: 32/8, 32/16 and 32/32. The loop with calculated shifts
> (and no masks) makes the code hard to understand.
A "compound" LiteX MMIO register of 32 bits total, starting at address
0x80000004, containing value 0x12345678, is spread across 4 8-bit
subregisters aligned at ulong in the MMIO space like this on LE:
0x82000000 00 00 00 00 12 00 00 00 34 00 00 00 56 00 00 00 ........4...V...
^^^^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^
0x82000010 78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 x...............
^^^^^^^^^^^
and like this on BE:
0x82000000 00 00 00 00 00 00 00 12 00 00 00 34 00 00 00 56 ...........4...V
^^^^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^
0x82000010 00 00 00 78 00 00 00 00 00 00 00 00 00 00 00 00 ...x............
^^^^^^^^^^^
LiteX can be optionally built to use larger than 8-bit subregisters,
here's an example with 16-bit subregisters (also aligned at ulong),
for the same "compound" register:
on LE:
0x82000000 00 00 00 00 34 12 00 00 78 56 00 00 00 00 00 00 ....4...xV......
^^^^^^^^^^^ ^^^^^^^^^^^
and on BE:
0x82000000 00 00 00 00 00 00 12 34 00 00 56 78 00 00 00 00 .......4..Vx....
^^^^^^^^^^^ ^^^^^^^^^^^
Essentially (back to the more common 8-bit subregister size), a compound
register foo = 0x12345678 is stored as
ulong foo[4] = {0x12, 0x34, 0x56, 0x78};
in the CPU's native endianness, aligned at the CPU's native word width
(hence "ulong").
With 16-bit subregisters that would then be:
ulong foo[2] = {0x1234, 0x5678};
Trouble with readl() and writel() is that they convert everything to LE
internally, which on BE would get us something different *within* each
subregister (i.e., 0x12000000 instead of 0x12, or 0x34120000 instead of
0x1234).
The cleanest way (IMHO) to accomplish an endian-agnostic readl() (that
preserves both barriers AND native endianness) is to undo the internal
__le32_to_cpu() using:
cpu_to_le32(readl(addr))
This keeps us away from using any '__' internals directly (e.g.,
__raw_readl()), or open-coding our own `litex_readl()`, e.g.:
static inline u32 litex_readl(const volatile void __iomem *addr)
{
u32 val;
__io_br();
val = __raw_readl(addr)); /* No le32 byteswap here! */
__io_ar(val);
return val;
}
... which is something that was strongly advised against in earlier
revisions of this series.
Cheers,
--Gabriel