Re: chipidea: udc: kernel panic in isr_setup_status_phase

From: Peter Chen
Date: Wed Aug 24 2016 - 04:21:28 EST

Next message: Joel: "Re: [RFC/PATCHSET 0/3] virtio: Implement virtio pstore device (v3)"
Previous message: Mel Gorman: "Re: what is the purpose of SLAB and SLUB (was: Re: [PATCH v3] mm/slab: Improve performance of gathering slabinfo) stats"
Next in thread: Clemens Gruber: "Re: chipidea: udc: kernel panic in isr_setup_status_phase"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Aug 23, 2016 at 02:36:30AM +0200, Clemens Gruber wrote:
> Hi,
>
> I am using an i.MX6Q embedded board, acting as a (ethernet) gadget with
> RNDIS function, connected over an USB OTG cable to a PC.
> Most of the time it works fine, but in some mysterious circumstances,
> a kernel panic occurs, just after attaching the OTG cable, connecting it
> to the other machine:
>
> [ 54.012989] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> [ 54.021099] pgd = 80004000
> [ 54.023816] [00000020] *pgd=00000000
> [ 54.027422] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
> [ 54.032915] Modules linked in:
> [ 54.035998] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc3-00017-g336bc4a #315
> [ 54.043662] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 54.050196] task: 80b05f80 task.stack: 80b00000
> [ 54.054744] PC is at isr_setup_status_phase+0x1c/0x40
> [ 54.059805] LR is at 0xbe570890
> [ 54.062957] pc : [<804ac464>] lr : [<be570890>] psr: 200e0193
> [ 54.062957] sp : 80b01e10 ip : be570570 fp : be570890
> [ 54.074442] r10: be5eeebc r9 : be570010 r8 : be5eeebc
> [ 54.079673] r7 : be5708d0 r6 : be5eee80 r5 : be7fcf40 r4 : 00000001
> [ 54.086206] r3 : be571010 r2 : 804ab368 r1 : 00000000 r0 : be570010
> [ 54.092742] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> [ 54.099972] Control: 10c5387d Table: 4e34404a DAC: 00000051
> [ 54.105723] Process swapper/0 (pid: 0, stack limit = 0x80b00210)
> (snip)
> [ 54.247100] [<804ac464>] (isr_setup_status_phase) from [<804acbbc>] (isr_tr_complete_handler+0x734/0x98c)
> [ 54.256680] [<804acbbc>] (isr_tr_complete_handler) from [<804acfc0>] (udc_irq+0x1ac/0x318)
> [ 54.264964] [<804acfc0>] (udc_irq) from [<8018ba28>] (__handle_irq_event_percpu+0x9c/0x128)
> [ 54.273330] [<8018ba28>] (__handle_irq_event_percpu) from [<8018bae0>] (handle_irq_event_percpu+0x2c/0x7c)
> [ 54.282995] [<8018bae0>] (handle_irq_event_percpu) from [<8018bb68>] (handle_irq_event+0x38/0x5c)
> [ 54.291880] [<8018bb68>] (handle_irq_event) from [<8018f2cc>] (handle_fasteoi_irq+0xd0/0x1bc)
> [ 54.300418] [<8018f2cc>] (handle_fasteoi_irq) from [<8018afb0>] (generic_handle_irq+0x24/0x34)
> [ 54.309042] [<8018afb0>] (generic_handle_irq) from [<8018b2dc>] (__handle_domain_irq+0x7c/0xec)
> [ 54.317754] [<8018b2dc>] (__handle_domain_irq) from [<80101524>] (gic_handle_irq+0x38/0x74)
> [ 54.326119] [<80101524>] (gic_handle_irq) from [<8010ccb0>] (__irq_svc+0x70/0xb0)
> (snip)
>
> After looking through the isr_setup_status_phase disassembly, I found
> that ci->status must have been NULL and dereferencing it in
> ci->status->context = ci; triggered the panic.
>
> The interrupt was a USBINT (UI bit was set) and isr_tr_complete_handler
> was called from udc_irq.
> In the IMX6DQRM I read about the UI bit: "This bit is also set by the
> Host/Device Controller when a short packet is detected." and about
> USBERRINT / UEI bit: "This bit is set along with the USBINT bit, if the
> TD on which the error interrupt occurred also had its interrupt on
> complete (IOC) bit set." (page 5494)
>
> However, we do not check for UEI in udc_irq.
> Could this be the cause of this error?

UEI is an error interrupt, and software have not handled it, so it will
not affect ci->status.

> Should we only call isr_tr_complete_handler if UI && !UEI ?
>
> Or would adding a check for ci->status == NULL in isr_setup-status_phase
> and returning an error code also be a good idea?

I agree with that.

>
> Do you have an idea what's going on there and why ci->status is NULL?
>

I can't understand it, the only possible is the last disconnect event
(see ci_udc_vbus_session->_gadget_stop_activity) has scheduled very late
due to vbus lowers very slow.

--

Best Regards,
Peter Chen

Next message: Joel: "Re: [RFC/PATCHSET 0/3] virtio: Implement virtio pstore device (v3)"
Previous message: Mel Gorman: "Re: what is the purpose of SLAB and SLUB (was: Re: [PATCH v3] mm/slab: Improve performance of gathering slabinfo) stats"
Next in thread: Clemens Gruber: "Re: chipidea: udc: kernel panic in isr_setup_status_phase"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]