Re: kernel panic with v5.18-rc1 on OpenPandora (only)

From: Arnd Bergmann
Date: Sat Apr 30 2022 - 11:36:33 EST


On Sat, Apr 30, 2022 at 3:16 PM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
> > Am 27.04.2022 um 11:37 schrieb Arnd Bergmann <arnd@xxxxxxxx>:
> > On Wed, Apr 27, 2022 at 10:38 AM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
>
> > You said that it still crashes without the wl1251
> > driver, so I assume
> > there is at least one more related bug. If you get a different call
> > chain without the
> > driver, or with the kmalloc() call, can you post that as well?
>
> For some time it did disappear but reported e.g.
>
> [ 29.457946] omap_hsmmc 480ad000.mmc: found wl1251
> [ 29.516174] wl1251: ERROR unsupported chip id: 0xdb0aea56

That does point to invalid DMA addresses.

> (value did change a little randomly), but just before sending out this mail I tried
> again (now with v5.18-rc4) and got this log (with wl1251 driver fixed as below):
>
> [ 31.069580] 1ec0: 00000000 c11198c0 c103226c 0000001a 00000000 c017d654 c1032200 c37c0a40
> [ 31.078155] 1ee0: 00000000 c1032200 c103226c c1032218 e0001f84 c0c77370 c37c0a40 c0def3c0
> [ 31.086761] 1f00: c0d02080 c017d78c c1032200 c103226c c1032218 c017d7f0 c1032200 c103226c
> [ 31.095336] 1f20: c1032218 c0181ee0 e0001f50 00000000 ffffffff c017cf6c e0001f50 c08b4d7c
> [ 31.103942] 1f40: c01013c4 600f0113 ffffffff c0100bec c37c0a40 c0c7e6c0 00000000 1ed15000
> [ 31.112548] 1f60: c0c7e6c0 c0c7e6c0 00000040 c0d02d00 c0c77370 c37c0a40 c0def3c0 c0d02080
> [ 31.121154] 1f80: c0c7d850 e0001fa0 c0101390 c01013c4 600f0113 ffffffff 00000051 c0101390
> [ 31.129730] 1fa0: e01b9e94 c37c0a40 c37c0a40 00400000 0000000a ffff96d9 c1037850 c0c7e6c0
> [ 31.138336] 1fc0: c0d02d00 c0c7e6c0 c37c0a40 c37c0a40 600f0113 ffffffff e01b9e94 c37c0a40
> [ 31.146911] 1fe0: c37c0a40 e01b9f60 e01b9e58 c0137314 c0158434 c013740c c0158434 c04c9c6c
> [ 31.155517] omap3_l3_app_irq from __handle_irq_event_percpu+0xb0/0x1dc
> [ 31.162475] __handle_irq_event_percpu from handle_irq_event_percpu+0xc/0x38
> [ 31.169891] handle_irq_event_percpu from handle_irq_event+0x38/0x5c
> [ 31.176605] handle_irq_event from handle_level_irq+0x7c/0xb4
> [ 31.182647] handle_level_irq from handle_irq_desc+0x1c/0x2c
> [ 31.188629] handle_irq_desc from generic_handle_arch_irq+0x28/0x3c
> [ 31.195220] generic_handle_arch_irq from __irq_svc+0x8c/0xcc
> [ 31.201263] Exception stack(0xe0001f50 to 0xe0001f98)
> [ 31.206604] 1f40: c37c0a40 c0c7e6c0 00000000 1ed15000
> [ 31.215179] 1f60: c0c7e6c0 c0c7e6c0 00000040 c0d02d00 c0c77370 c37c0a40 c0def3c0 c0d02080
> [ 31.223785] 1f80: c0c7d850 e0001fa0 c0101390 c01013c4 600f0113 ffffffff
> [ 31.230743] __irq_svc from __do_softirq+0x84/0x304
> [ 31.235870] __do_softirq from __irq_exit_rcu+0x8c/0xd4
> [ 31.241363] __irq_exit_rcu from irq_exit+0x8/0x10
> [ 31.246429] irq_exit from call_with_stack+0x18/0x20
> [ 31.251647] call_with_stack from __irq_svc+0x98/0xcc
> [ 31.256988] Exception stack(0xe01b9e60 to 0xe01b9ea8)
> [ 31.262298] 9e60: df993a40 c37c0a40 00000000 00000001 df993a40 c3245000 c133c2c0 00000002
> [ 31.270904] 9e80: c37c0a40 00000000 e01b9f60 e01b9edc e01b9ee0 e01b9eb0 c08ba55c c0158434
> [ 31.279479] 9ea0: 600f0113 ffffffff
> [ 31.283172] __irq_svc from finish_task_switch+0x12c/0x1ec
> [ 31.288940] finish_task_switch from __schedule+0x3cc/0x558
> [ 31.294799] __schedule from schedule+0x70/0xc0
> [ 31.299591] schedule from do_work_pending+0x30/0x3dc
> [ 31.304901] do_work_pending from slow_work_pending+0xc/0x20
> [ 31.310852] Exception stack(0xe01b9fb0 to 0xe01b9ff8)
> [ 31.316192] 9fa0: 00002cf8 00000000 50000000 b6f99000
> [ 31.324768] 9fc0: b6f9bcfc b6f9bcf8 00000000 00000000 00000010 00000000 00001e94 00000000
> [ 31.333374] 9fe0: b6f9bcf8 bea66f80 b6f9bcfc 004bfc6a 40070030 ffffffff
> [ 31.340332] Code: e0000002 e0011003 e1901001 0a000002 (e7f001f2)
> [ 31.346740] ---[ end trace 0000000000000000 ]---


I suppose this could be anywhere then. The backtrace seems to point
to re-enabling interupts in do_work_pending, so something probably
accessed DMA memory asynchronously.


>
> rm -rf lib/modules/5.18.0-rc4-letux+/kernel/drivers/net/wireless/ti/wl1251
>
> done on the SD card makes the problems go away.

Good, so I guess that means there is another bug in wl1251 DMA handling,
while everything else is fine.

> diff --git a/drivers/net/wireless/ti/wl1251/io.c b/drivers/net/wireless/ti/wl1251/io.c
> index 5ebe7958ed5c7..76aceecc281fb 100644
> --- a/drivers/net/wireless/ti/wl1251/io.c
> +++ b/drivers/net/wireless/ti/wl1251/io.c
> @@ -121,7 +121,13 @@ void wl1251_set_partition(struct wl1251 *wl,
> u32 mem_start, u32 mem_size,
> u32 reg_start, u32 reg_size)
> {
> - struct wl1251_partition partition[2];
> + struct wl1251_partition_set *partition;
> +
> + partition = kmalloc(sizeof(*partition), GFP_KERNEL);
> + if (!partition) {
> + wl1251_error("can not set partition");
> + return;
> + }
>
> wl1251_debug(DEBUG_SPI, "mem_start %08X mem_size %08X",
> mem_start, mem_size);
> @@ -164,10 +170,10 @@ void wl1251_set_partition(struct wl1251 *wl,
> reg_start, reg_size);
> }
>
> - partition[0].start = mem_start;
> - partition[0].size = mem_size;
> - partition[1].start = reg_start;
> - partition[1].size = reg_size;
> + partition->mem.start = mem_start;
> + partition->mem.size = mem_size;
> + partition->reg.start = reg_start;
> + partition->reg.size = reg_size;
>
> wl->physical_mem_addr = mem_start;
> wl->physical_reg_addr = reg_start;
> @@ -176,5 +182,7 @@ void wl1251_set_partition(struct wl1251 *wl,
> wl->virtual_reg_addr = mem_size;
>
> wl->if_ops->write(wl, HW_ACCESS_PART0_SIZE_ADDR, partition,
> - sizeof(partition));
> + sizeof(*partition));
> +

Changing the type of the structure looks a bit odd, but it does seem
like a valid transformation otherwise.

I see more callers of wl1251_mem_write() or wl1251_mem_read() with
on-stack arguments in wl1251_tx_complete(), wl1251_event_wait(),
and wl1251_event_handle(). Those will need the same kmalloc()
change as your wl1251_set_partition() fix I think.

If that's not enough, try enabling CONFIG_DMA_API_DEBUG
to get an is_vmalloc_address() check on every DMA operation.

Arnd