Re: [PATCH] wifi: mt76: mmio_(read|write)_copy byte swap when on Big Endian

From: Caleb James DeLisle

Date: Wed Oct 29 2025 - 16:40:50 EST



On 29/10/2025 21:12, Jonas Gorski wrote:
On Wed, Oct 29, 2025 at 4:24 PM Caleb James DeLisle <cjd@xxxxxxxx> wrote:

On 29/10/2025 10:15, Jonas Gorski wrote:
On Tue, Oct 28, 2025 at 10:42 PM Caleb James DeLisle <cjd@xxxxxxxx> wrote:
On 28/10/2025 21:19, Jonas Gorski wrote:
Hi,

On Mon, Oct 27, 2025 at 6:19 PM Caleb James DeLisle <cjd@xxxxxxxx> wrote:
When on a Big Endian machine, PCI swaps words to/from LE when
reading/writing them. This presents a problem when we're trying
to copy an opaque byte array such as firmware or encryption key.

Byte-swapping during copy results in two swaps, but solves the
problem.

Fixes:
mt76x2e 0000:02:00.0: ROM patch build: 20141115060606a
mt76x2e 0000:02:00.0: Firmware Version: 0.0.00
mt76x2e 0000:02:00.0: Build: 1
mt76x2e 0000:02:00.0: Build Time: 201607111443____
mt76x2e 0000:02:00.0: Firmware failed to start
mt76x2e 0000:02:00.0: probe with driver mt76x2e failed with error -145

Signed-off-by: Caleb James DeLisle <cjd@xxxxxxxx>
---
drivers/net/wireless/mediatek/mt76/mmio.c | 34 +++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/drivers/net/wireless/mediatek/mt76/mmio.c b/drivers/net/wireless/mediatek/mt76/mmio.c
index cd2e9737c3bf..776dbaacc8a3 100644
--- a/drivers/net/wireless/mediatek/mt76/mmio.c
+++ b/drivers/net/wireless/mediatek/mt76/mmio.c
@@ -30,15 +30,49 @@ static u32 mt76_mmio_rmw(struct mt76_dev *dev, u32 offset, u32 mask, u32 val)
return val;
}

+static void mt76_mmio_write_copy_portable(void __iomem *dst,
+ const u8 *src, int len)
+{
+ __le32 val;
+ int i = 0;
+
+ for (i = 0; i < ALIGN(len, 4); i += 4) {
+ memcpy(&val, src + i, sizeof(val));
+ writel(cpu_to_le32(val), dst + i);
+ }
+}
+
static void mt76_mmio_write_copy(struct mt76_dev *dev, u32 offset,
const void *data, int len)
{
+ if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) {
+ mt76_mmio_write_copy_portable(dev->mmio.regs + offset, data,
+ len);
+ return;
+ }
__iowrite32_copy(dev->mmio.regs + offset, data, DIV_ROUND_UP(len, 4));
Maybe just replace this with memcpy_toio() which does no swapping at
all instead of double swapping on BE?
I'm not that informed about how PCI works so I had to test to confirm
my understanding, but I can confirm that memcpy_toio() does not solve
the problem.
Ah, right, I misread _iowrite32_copy() to do conversion to LE, but it doesn't.

What architecture is this you have? PowerPC? ARM? MIPS? 32 bit? 64 bit?

MIPS32 (EcoNet EN751221 34Kc)


So the differences I see are:

1. __iowrite32_copy() uses __raw_writel(), which has different memory
semantics than writel()
2. __iowrite32_copy() assumed src is aligned to 32 bit, while you
explicitly align it
3. memcpy_toio() will handle unaligned src properly, but does 64 bit
accesses on 64 bit systems (and uses __raw* again).

Is src aligned? If not, then the issue might be 2. And if your system
is 64 bit, it would explain why 3 didn't help.

I'm not a regular developer of mt76 so I wasn't sure if that was
guaranteed and I just wanted to code for safety.

After reviewing the code, I see that there are a few places where
mt76_mmio_write_copy is being called with stack-allocated u8 arrays
so it's pretty clear to me that this is being treated as a memcpy-like
function and we should be handling unaligned inputs.


As a first step you could try to replace the writel(cpu_to_le32(val)
with a iowrite32be(val, ...) which should do the same except avoiding
the doubled byte swapping. If that works, you can try to replace it
This works.

These symbols are a bit of a nightmare to trace, so I ended up making
an .i file so I could confirm what's happening.

iowrite32be() uses the version in iomap.c so I understand that's using
writel(swab32(val),port), so a writel with an unconditional byte swap.
writel() is more complicated, it's an inline function that is generated
in a rat's nest of preprocessor macros in mips/include/asm/io.h

The preprocessed is this:

__mem = (void *)((unsigned long)(mem)); __val = (val); if (sizeof(u32)
!= sizeof(u64) || sizeof(u64) == sizeof(long)) { *__mem = __val;

The source is this:

__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem)); \
__val = pfx##ioswab##bwlq(__mem, val); \
if (sizeof(type) != sizeof(u64) || sizeof(u64) == sizeof(long)) \
*__mem = __val; \

The line "pfx##ioswab##bwlq(__mem, val);" is ioswabl() and the source
of that explains the issue:

* Sane hardware offers swapping of PCI/ISA I/O space accesses in hardware;
* less sane hardware forces software to fiddle with this...

So this confirms my initial understanding, the PCI hardware is doing the
byte swapping unconditionally.


with __raw_writel(), which then would make this the same as
__iowrite32_copy, except making sure that src is aligned.

This fails.

Since I'm the maintainer of this SoC and it's still fairly new, I wrote
a trivial kmod to verify that unaligned access is not just silently
returning trash, it works as though it were aligned so alignment is
not the issue.


Also you could replace your memcpy() with get_unaligned((u32 *)(src +
i)); Should do the same but inline.
Good idea, I will do this.
The issue as I understand it is that rather than making every driver
carefully call cpu_to_le*() every MMIO write, someone decided to make
the PCI host bridge itself transparently byte-swap all MMIO on the
wire. Since most MMIO is hitting registers and most buffers are
transferred by DMA, for the most part everything works and nobody
notices.

But in the rare case that we need to write a blob to MMIO, it gets
transparently swapped in hardware so you need to use cpu_to_le in that
case. Doing a search of ./drivers for write.*cpu_to_le I can see this
issue comes up a bit.
Every (PCI) driver does conversion to LE implicitly by using
writel/readl (or iowrite32/ioread32) etc do the conversion to/from LE.
So writel(foo, dst )is a __raw_writel(cpu_to_le32(foo), dst) etc. PCI
memory is assumed to be in LE. If you are on a little endian system,
then no byte swapping happens, and on BE it will do byte swapping
before writing the value.
Okay so it seems that in the case of MIPS, that's not always how it
works.

https://github.com/torvalds/linux/blob/e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6/arch/mips/include/asm/mach-generic/mangle-port.h#L17

Since we don't know if the swap will happen in hardware or software
and (AFAIK) this is not a hot enough path that double-swapping will
have a notable performance penalty, I think the most sane thing to
do is use writel(cpu_to_le32()) and not care if it's swapped back
in the kernel or hardware.
Oh, I think I see what it happening here. ECONET is a Big Endian MIPS
platform, but does not select SWAP_IO_SPACE (most other BE platforms
do).

Does that mean the PCI space is swapped in hardware there?

I guess that means that anything that uses __raw accessors to PCI
space directly or indirectly is broken, as the raw data is now
actually in the wrong order and needs to be swab'd.

I don't know if it is a good idea to change this in __iowrite32_copy()
/ __ioread32_copy() (and other helpers), or if there are drivers that
use it on non-PCI spaces and would be broken by that.

If there is a way, I would suggest disabling hardware conversion and
selecting SWAP_IO_SPACE, but that will affect a lot of your code that
assumes that writel() etc don't convert to/from little endian.


I can look around in the hardware registers and see if I can shut it
off for EcoNet, but if you're saying MT76 should not support BE unless
they disable hardware swapping and use SWAP_IO_SPACE, that means the
majority of BE hardware on OpenWrt is not going to be supported. If
that's the decision then it at least warrants clear documentation.

Thanks,
Caleb


user@cjd-dev:~/en7526/openwrt$ find ./ -name 'config-6.12' | while read x; do grep -q 'CPU_BIG_ENDIAN=y' "$x" && ( grep -q 'SWAP_IO_SPACE=y' "$x" || echo "$x
 does not use SWAP_IO_SPACE" ) ; done
./target/linux/apm821xx/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl931x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl930x_nand/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl839x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl931x_nand/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl930x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl838x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/ath79/config-6.12 does not use SWAP_IO_SPACE
./target/linux/octeon/config-6.12 does not use SWAP_IO_SPACE
./target/linux/ixp4xx/config-6.12 does not use SWAP_IO_SPACE
./target/linux/econet/en751221/config-6.12 does not use SWAP_IO_SPACE
user@cjd-dev:~/en7526/openwrt$ find ./ -name 'config-6.12' | while read x; do grep -q 'CPU_BIG_ENDIAN=y' "$x" && ( grep -q 'SWAP_IO_SPACE=y' "$x" && echo "$x
 does use SWAP_IO_SPACE" ) ; done
./target/linux/bmips/bcm6358/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6328/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6318/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6368/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6362/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm63268/config-6.12 does use SWAP_IO_SPACE
./target/linux/lantiq/config-6.12 does use SWAP_IO_SPACE
user@cjd-dev:~/en7526/openwrt$



Best regards,
Jonas