Re: suspend problem with dwc3 gadget / g_multi when usb disconnected

From: Martin Kepplinger
Date: Mon Dec 16 2019 - 05:10:17 EST


On 13.12.19 13:19, Martin Kepplinger wrote:
>
>
> On 12.12.19 13:39, Felipe Balbi wrote:
>>
>> Hi,
>>
>> Martin Kepplinger <martin.kepplinger@xxxxxxx> writes:
>
>
>>> [ 64.405226] Modules linked in: rfkill st_magn_spi st_sensors_spi
>>> mousedev qmi_wwan caam_jr caamhash_desc caamalg_desc cdc_wdm option
>>> usbnet usb_wwan mii usbserial st_magn_i2c st_magn st_sensors_i2c
>>> st_sensors industrialio_triggered_buffer goodix kfifo_buf tcpci tcpm
>>> roles imx2_wdt watchdog aes_ce_blk crypto_simd typec bq25890_charger
>>> caam imx_sdma virt_dma error crct10dif_ce ghash_ce sha2_ce sha1_ce
>>> gpio_vibra usb_f_acm u_serial usb_f_rndis g_multi usb_f_mass_storage
>>> u_ether libcomposite ip_tables x_tables ipv6 nf_defrag_ipv6
>>> xhci_plat_hcd xhci_hcd usbcore dwc3 snvs_pwrkey ulpi udc_core
>>> phy_fsl_imx8mq_usb usb_common
>>> [ 64.462126] CPU: 3 PID: 884 Comm: ip Not tainted
>>> 5.5.0-rc1-next-20191212-librem5-00019-g4fb81fdd9c7d #1
>>> [ 64.471909] Hardware name: Purism Librem 5 devkit (DT)
>>> [ 64.477264] pstate: 60000085 (nZCv daIf -PAN -UAO)
>>> [ 64.482268] pc : dwc3_stop_active_transfer.constprop.38+0xe8/0x100 [dwc3]
>>> [ 64.489351] lr : dwc3_stop_active_transfer.constprop.38+0x64/0x100 [dwc3]
>>> [ 64.496421] sp : ffff00009f213240
>>> [ 64.499874] x29: ffff00009f213240 x28: 0000000000000000
>>> [ 64.505410] x27: ffff00009f2137b0 x26: ffff0000a4e0cc10
>>> [ 64.510946] x25: 0000000000000001 x24: ffff0000a324518c
>>> [ 64.516482] x23: ffff0000a3245193 x22: ffff0000a41f0080
>>> [ 64.522018] x21: 0000000000000000 x20: ffff0000a4176600
>>> [ 64.527553] x19: ffff800010ee87c8 x18: 0000000000000000
>>> [ 64.533089] x17: 0000000000000000 x16: 0000000000000000
>>> [ 64.538625] x15: 0000000000000000 x14: 0000000000000000
>>> [ 64.544160] x13: ffff0000a4176868 x12: ffff0000a4176868
>>> [ 64.549696] x11: ffffffffffff3f08 x10: 0000000000000008
>>> [ 64.555232] x9 : ffffffffffff3f04 x8 : 0000000000000811
>>> [ 64.560767] x7 : ffffffffffff3f00 x6 : ffffffffffff3f0c
>>> [ 64.566303] x5 : 0000000000000000 x4 : ffff80001007483c
>>> [ 64.571839] x3 : 0000000000000808 x2 : ffff00009f21327c
>>> [ 64.577375] x1 : 0000000000000040 x0 : 00000000ffffff92
>>> [ 64.582911] Call trace:
>>> [ 64.585474] dwc3_stop_active_transfer.constprop.38+0xe8/0x100 [dwc3]
>>> [ 64.592194] __dwc3_gadget_ep_disable+0x34/0x380 [dwc3]
>>> [ 64.597649] dwc3_gadget_ep_disable+0x44/0xf8 [dwc3]
>>> [ 64.602841] usb_ep_disable+0x34/0x100 [udc_core]
>>> [ 64.607749] eth_stop+0xb4/0x130 [u_ether]
>>> [ 64.612026] __dev_close_many+0xb4/0x138
>>> [ 64.616116] __dev_change_flags+0xb8/0x1d0
>>> [ 64.620385] dev_change_flags+0x28/0x68
>>> [ 64.624384] do_setlink+0x30c/0xc90
>>> [ 64.628021] __rtnl_newlink+0x3f8/0x788
>>> [ 64.632018] rtnl_newlink+0x54/0x80
>>> [ 64.635654] rtnetlink_rcv_msg+0x128/0x370
>>> [ 64.639926] netlink_rcv_skb+0x60/0x120
>>> [ 64.643924] rtnetlink_rcv+0x1c/0x28
>>> [ 64.647650] netlink_unicast+0x1b8/0x278
>>> [ 64.651739] netlink_sendmsg+0x1ac/0x3b0
>>> [ 64.655829] ____sys_sendmsg+0x250/0x298
>>> [ 64.659918] ___sys_sendmsg+0x88/0xc8
>>> [ 64.663735] __sys_sendmsg+0x70/0xc0
>>> [ 64.667463] __arm64_sys_sendmsg+0x28/0x30
>>> [ 64.671736] el0_svc_common.constprop.3+0x98/0x170
>>> [ 64.676729] el0_svc_handler+0x20/0x28
>>> [ 64.680638] el0_sync_handler+0x134/0x1a0
>>> [ 64.684817] el0_sync+0x140/0x180
>>> [ 64.688271] ---[ end trace 4f6aa846a9c6f20c ]---
>>>
>>> ... again. at every shutdown or rmmod.
>>>
>>> I hope that this might be already helpful.
>>
>> I need to see the trace events for this failure case too. You could add
>> ftrace_dump_on_oops to your cmdline and change the WARN_ONCE() to a
>> BUG_ON() or something.
>>
>
> doing that and having "ftrace=function ftrace_dump_on_oops" dumps way
> too much to handle and somehow my board resets while printing...
>
> I found I can avoid the above error during shutdown though by keeping a
> few regulators enabled for now. I'm not entirely sure if that's a dwc3
> problem now, so let's leave the above for now.
>
> Back to my first issue: disconnected USB and suspend:
>
> I disable all gadget configs now.
>
> "echo mem > /sys/power/state" still only hangs the system - again: all
> that's in the logs is:
>
> [ 97.405251] PM: suspend entry (deep)
>
> and on the console then:
>
> [ 118.608738] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 118.615093] rcu: 1-...0: (0 ticks this GP)
> idle=e86/1/0x4000000000000000 softirq=16109/16109 fqs=2626
> [ 181.628739] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 181.635088] rcu: 1-...0: (0 ticks this GP)
> idle=e86/1/0x4000000000000000 softirq=16109/16109 fqs=10503
> [ 244.648738] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 244.655087] rcu: 1-...0: (0 ticks this GP)
> idle=e86/1/0x4000000000000000 softirq=16109/16109 fqs=17402
>
> ... not sure how to further debug that.
>

actually I do :) more logs to this: USB connected / disconnected
doesn't matter: we stop during resume at:

[ 121.997310] dwc3 38100000.usb: driver resume

(then sometimes)

[ 173.919770] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 173.926220] rcu: 0-...!: (2 ticks this GP)
idle=f32/1/0x4000000000000000 softirq=9941/9941 fqs=219
[ 173.935809] (detected by 1, t=12983 jiffies, g=27649, q=0)
[ 173.941687] Task dump for CPU 0:
[ 173.945122] bash R running task 0 834 833
0x00000008
[ 173.952560] Call trace:
[ 173.955197] __switch_to+0xc0/0x210
[ 173.958962] 0xf601ec1e0eef7d00
[ 173.962317] rcu: rcu_preempt kthread starved for 12545 jiffies!
g27649 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[ 173.973347] rcu: RCU grace-period kthread stack dump:
[ 173.978678] rcu_preempt I 0 10 2 0x00000028
[ 173.984476] Call trace:
[ 173.987103] __switch_to+0xc0/0x210
[ 173.990820] __schedule+0x2d4/0x5d0
[ 173.994532] schedule+0x48/0x100
[ 173.997972] schedule_timeout+0x1ac/0x410
[ 174.002231] rcu_gp_kthread+0x484/0x12f8
[ 174.006396] kthread+0x124/0x128
[ 174.009835] ret_from_fork+0x10/0x18