Re: [PATCH] mips: bmips: BCM6358: disable arch_sync_dma_for_cpu_all()

From: Jonas Gorski
Date: Sat Mar 11 2023 - 14:44:27 EST


On Sat, 11 Mar 2023 at 18:32, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>
>
>
> On 3/10/2023 4:13 AM, Álvaro Fernández Rojas wrote:
> > arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI:
> > [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform
> > [ 3.895011] Reserved instruction in kernel code[#1]:
> > [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0
> > [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060
> > [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0
> > [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000
> > [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa
> > [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470
> > [ 3.932848] $20 : 00000000 00000000 55590000 77d70000
> > [ 3.938251] $24 : 00000018 00000010
> > [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc
> > [ 3.949058] Hi : 00000000
> > [ 3.952013] Lo : 00000000
> > [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c
> > [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c
> > [ 3.965913] Status: 10008703 KERNEL EXL IE
> > [ 3.970216] Cause : 00800028 (ExcCode 0a)
> > [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350)
> > [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
> > [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8)
> > [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470
> > [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74
> > [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003
> > [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000
> > [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000
> > [ 4.044196] ...
> > [ 4.046706] Call Trace:
> > [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c
> > [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124
> > [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288
> > [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18
> > [ 4.069036]
> > [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010
> > [ 4.080686]
> > [ 4.082517] ---[ end trace 22a8edb41f5f983b ]---
> > [ 4.087374] Kernel panic - not syncing: Fatal exception
> > [ 4.092753] Rebooting in 1 seconds..
>
> Did you pinpoint which specific instruction within
> arch_sync_dma_for_cpu_all() is causing the reserved instruction exception?

It's setup_sigcontext(), not arch_sync_dma_for_cpu_all() that's
causing the exception ;-)

Hand decoding the Code gives me

lw $1, 0xb4($fp)
or $v0, 0, 0
addiu $a0, $s1, 8
sw $v0, 0($a0) <- the code in brackets, so I guess EPC?
sw $v1, 4($a0)

which I assume is this part:

err |= __put_user(regs->cp0_epc, &sc->sc_pc);

(0xb4 is the offset of cp0_epc, 0x8 the offset of sc_pc)

One thing I see is that we do the RAC flush for BMIPS3300, 4350 and
4380, but only initialize it for 3300 [1], but leave it at whatever
state the bootloader did for the other ones. Maybe it has some invalid
config in (that particuar?) 6358 that triggers issues later on after a
flush? E.g. the flush puts it in an error state, and the next time
something triggers a prefetch(write?) (by trying to access userspace)
it generates an error exception.

Just spit balling though.

[1] https://elixir.bootlin.com/linux/latest/source/arch/mips/kernel/smp-bmips.c#L587

Jonas