Re: [PATCH] usb: dwc2: Revert "usb: dwc2: Disable all EP's on disconnect"

From: Marek Szyprowski
Date: Fri Nov 23 2018 - 04:49:41 EST


Hi Minas,

On 2018-11-22 07:53, Minas Harutyunyan wrote:
> On 11/21/2018 7:45 PM, Marek Szyprowski wrote:
>> This reverts commit dccf1bad4be7eaa096c1f3697bd37883f9a08ecb.
>>
>> The reverted commit breaks locking in the DWC2 driver. It causes random
>> crashes or memory corruption related issues on SMP machines. Here is an
>> example of the observed reproducible issue (other are a bit more random):
>>
>> =====================================
>> WARNING: bad unlock balance detected!
>> 4.19.0-rc6-00027-gdccf1bad4be7-dirty #1119 Not tainted
>> -------------------------------------
>> ip/1464 is trying to release lock (&(&hsotg->lock)->rlock) at:
>> [<c0615494>] dwc2_hsotg_complete_request+0x84/0x218
>> but there are no more locks to release!
>>
>> other info that might help us debug this:
>> 2 locks held by ip/1464:
>> #0: d69babd3 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x224/0x488
>> #1: 5fb350d2 (&(&dev->lock)->rlock){-.-.}, at: eth_stop+0x24/0xa8
>>
>> stack backtrace:
>> CPU: 1 PID: 1464 Comm: ip Not tainted 4.19.0-rc6-00027-gdccf1bad4be7-dirty #1119
>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> [<c0111f9c>] (unwind_backtrace) from [<c010deb8>] (show_stack+0x10/0x14)
>> [<c010deb8>] (show_stack) from [<c09d3398>] (dump_stack+0x90/0xc8)
>> [<c09d3398>] (dump_stack) from [<c017d02c>] (print_unlock_imbalance_bug+0xb0/0xe0)
>> [<c017d02c>] (print_unlock_imbalance_bug) from [<c01800dc>] (lock_release+0x1a4/0x374)
>> [<c01800dc>] (lock_release) from [<c09ef7fc>] (_raw_spin_unlock+0x18/0x54)
>> [<c09ef7fc>] (_raw_spin_unlock) from [<c0615494>] (dwc2_hsotg_complete_request+0x84/0x218)
>> [<c0615494>] (dwc2_hsotg_complete_request) from [<c0617530>] (kill_all_requests+0x44/0xb4)
>> [<c0617530>] (kill_all_requests) from [<c0617690>] (dwc2_hsotg_ep_disable+0xf0/0x200)
>> [<c0617690>] (dwc2_hsotg_ep_disable) from [<c0659698>] (usb_ep_disable+0xd0/0x1c8)
>> [<c0659698>] (usb_ep_disable) from [<c065d4a8>] (eth_stop+0x68/0xa8)
>> [<c065d4a8>] (eth_stop) from [<c077e0b8>] (__dev_close_many+0x94/0xfc)
>> [<c077e0b8>] (__dev_close_many) from [<c078a064>] (__dev_change_flags+0xa0/0x198)
>> [<c078a064>] (__dev_change_flags) from [<c078a17c>] (dev_change_flags+0x18/0x48)
>> [<c078a17c>] (dev_change_flags) from [<c07a1a98>] (do_setlink+0x298/0x990)
>> [<c07a1a98>] (do_setlink) from [<c07a29f0>] (rtnl_newlink+0x4a0/0x6fc)
>> [<c07a29f0>] (rtnl_newlink) from [<c079dd74>] (rtnetlink_rcv_msg+0x254/0x488)
>> [<c079dd74>] (rtnetlink_rcv_msg) from [<c07c47f4>] (netlink_rcv_skb+0xe0/0x118)
>> [<c07c47f4>] (netlink_rcv_skb) from [<c07c4094>] (netlink_unicast+0x180/0x1c8)
>> [<c07c4094>] (netlink_unicast) from [<c07c4428>] (netlink_sendmsg+0x2bc/0x348)
>> [<c07c4428>] (netlink_sendmsg) from [<c0760b9c>] (sock_sendmsg+0x14/0x24)
>> [<c0760b9c>] (sock_sendmsg) from [<c0761af8>] (___sys_sendmsg+0x22c/0x248)
>> [<c0761af8>] (___sys_sendmsg) from [<c0762740>] (__sys_sendmsg+0x40/0x6c)
>> [<c0762740>] (__sys_sendmsg) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
>> Exception stack(0xede65fa8 to 0xede65ff0)
>> ...
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>> ---
>> The suspicious locking has been already pointed in the review of the
>> first version of this patch, but sadly the v2 didn't change it and
>> landed in v4.20-rc1.
>> ---
>> drivers/usb/dwc2/gadget.c | 30 +++++++-----------------------
>> 1 file changed, 7 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
>> index 79189db4bf17..220c0f9b89b0 100644
>> --- a/drivers/usb/dwc2/gadget.c
>> +++ b/drivers/usb/dwc2/gadget.c
>> @@ -3109,8 +3109,6 @@ static void kill_all_requests(struct dwc2_hsotg *hsotg,
>> dwc2_hsotg_txfifo_flush(hsotg, ep->fifo_index);
>> }
>>
>> -static int dwc2_hsotg_ep_disable(struct usb_ep *ep);
>> -
>> /**
>> * dwc2_hsotg_disconnect - disconnect service
>> * @hsotg: The device state.
>> @@ -3129,12 +3127,13 @@ void dwc2_hsotg_disconnect(struct dwc2_hsotg *hsotg)
>> hsotg->connected = 0;
>> hsotg->test_mode = 0;
>>
>> - /* all endpoints should be shutdown */
>> for (ep = 0; ep < hsotg->num_of_eps; ep++) {
>> if (hsotg->eps_in[ep])
>> - dwc2_hsotg_ep_disable(&hsotg->eps_in[ep]->ep);
>> + kill_all_requests(hsotg, hsotg->eps_in[ep],
>> + -ESHUTDOWN);
>> if (hsotg->eps_out[ep])
>> - dwc2_hsotg_ep_disable(&hsotg->eps_out[ep]->ep);
>> + kill_all_requests(hsotg, hsotg->eps_out[ep],
>> + -ESHUTDOWN);
>> }
>>
>> call_gadget(hsotg, disconnect);
>> @@ -3192,23 +3191,13 @@ void dwc2_hsotg_core_init_disconnected(struct dwc2_hsotg *hsotg,
>> u32 val;
>> u32 usbcfg;
>> u32 dcfg = 0;
>> - int ep;
>>
>> /* Kill any ep0 requests as controller will be reinitialized */
>> kill_all_requests(hsotg, hsotg->eps_out[0], -ECONNRESET);
>>
>> - if (!is_usb_reset) {
>> + if (!is_usb_reset)
>> if (dwc2_core_reset(hsotg, true))
>> return;
>> - } else {
>> - /* all endpoints should be shutdown */
>> - for (ep = 1; ep < hsotg->num_of_eps; ep++) {
>> - if (hsotg->eps_in[ep])
>> - dwc2_hsotg_ep_disable(&hsotg->eps_in[ep]->ep);
>> - if (hsotg->eps_out[ep])
>> - dwc2_hsotg_ep_disable(&hsotg->eps_out[ep]->ep);
>> - }
>> - }
>>
>> /*
>> * we must now enable ep0 ready for host detection and then
>> @@ -4004,7 +3993,6 @@ static int dwc2_hsotg_ep_disable(struct usb_ep *ep)
>> unsigned long flags;
>> u32 epctrl_reg;
>> u32 ctrl;
>> - int locked;
>>
>> dev_dbg(hsotg->dev, "%s(ep %p)\n", __func__, ep);
>>
>> @@ -4020,9 +4008,7 @@ static int dwc2_hsotg_ep_disable(struct usb_ep *ep)
>>
>> epctrl_reg = dir_in ? DIEPCTL(index) : DOEPCTL(index);
>>
>> - locked = spin_is_locked(&hsotg->lock);
>> - if (!locked)
>> - spin_lock_irqsave(&hsotg->lock, flags);
>> + spin_lock_irqsave(&hsotg->lock, flags);
>>
>> ctrl = dwc2_readl(hsotg, epctrl_reg);
>>
>> @@ -4046,9 +4032,7 @@ static int dwc2_hsotg_ep_disable(struct usb_ep *ep)
>> hs_ep->fifo_index = 0;
>> hs_ep->fifo_size = 0;
>>
>> - if (!locked)
>> - spin_unlock_irqrestore(&hsotg->lock, flags);
>> -
>> + spin_unlock_irqrestore(&hsotg->lock, flags);
>> return 0;
>> }
>>
>>
> Could you please apply "[PATCH v2] usb: dwc2: Disable all EP's on
> disconnect" and fix for that patch "[PATCH] usb: dwc2: Fix ep disable
> spinlock flow.". Let me know on test results.

I've checked and even with your fix "[PATCH] usb: dwc2: Fix ep disable
spinlock flow.", the issue is there. Playing conditionally with
spinlocks seems to be very bad idea. When spinlock is taken, the other
CPU is touching DWC2 registers. Don't expect that if you skip spinlock
and forcibly change DWC2 registers, the hardware will do what you really
expects... There random crashes and lockdep still prints a warning:

=====================================
WARNING: bad unlock balance detected!
4.20.0-rc3-00005-g1246b0e2e509 #5096 Not tainted
-------------------------------------
ip/1491 is trying to release lock (&(&hsotg->lock)->rlock) at:
[<c061c2ec>] dwc2_hsotg_complete_request+0x84/0x214
but there are no more locks to release!

other info that might help us debug this:
2 locks held by ip/1491:
Â#0: e2180f30 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x224/0x488
Â#1: 68ab9014 (&(&dev->lock)->rlock){-.-.}, at: eth_stop+0x24/0xa8

stack backtrace:
CPU: 1 PID: 1491 Comm: ip Not tainted 4.20.0-rc3-00005-g1246b0e2e509 #5096
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c0111a0c>] (unwind_backtrace) from [<c010d8f4>] (show_stack+0x10/0x14)
[<c010d8f4>] (show_stack) from [<c09e4198>] (dump_stack+0x90/0xc8)
[<c09e4198>] (dump_stack) from [<c017d2c8>]
(print_unlock_imbalance_bug+0xb0/0xe0)
[<c017d2c8>] (print_unlock_imbalance_bug) from [<c0180404>]
(lock_release+0x190/0x380)
[<c0180404>] (lock_release) from [<c0a042a4>] (_raw_spin_unlock+0x18/0x54)
[<c0a042a4>] (_raw_spin_unlock) from [<c061c2ec>]
(dwc2_hsotg_complete_request+0x84/0x214)
[<c061c2ec>] (dwc2_hsotg_complete_request) from [<c061e378>]
(kill_all_requests+0x44/0xb4)
[<c061e378>] (kill_all_requests) from [<c061e4fc>]
(dwc2_hsotg_ep_disable+0x114/0x21c)
[<c061e4fc>] (dwc2_hsotg_ep_disable) from [<c066085c>]
(usb_ep_disable+0xd0/0x1c8)
[<c066085c>] (usb_ep_disable) from [<c0664664>] (eth_stop+0x68/0xa8)
[<c0664664>] (eth_stop) from [<c07871b8>] (__dev_close_many+0x94/0xfc)
[<c07871b8>] (__dev_close_many) from [<c07931b0>]
(__dev_change_flags+0xa0/0x198)
[<c07931b0>] (__dev_change_flags) from [<c07932c8>]
(dev_change_flags+0x18/0x48)
[<c07932c8>] (dev_change_flags) from [<c07ab798>] (do_setlink+0x298/0x990)
[<c07ab798>] (do_setlink) from [<c07ac760>] (rtnl_newlink+0x4a8/0x70c)
[<c07ac760>] (rtnl_newlink) from [<c07a7688>]
(rtnetlink_rcv_msg+0x254/0x488)
[<c07a7688>] (rtnetlink_rcv_msg) from [<c07cf844>]
(netlink_rcv_skb+0xe0/0x118)
[<c07cf844>] (netlink_rcv_skb) from [<c07cf0e4>]
(netlink_unicast+0x180/0x1c8)
[<c07cf0e4>] (netlink_unicast) from [<c07cf478>]
(netlink_sendmsg+0x2bc/0x348)
[<c07cf478>] (netlink_sendmsg) from [<c0769a38>] (sock_sendmsg+0x14/0x24)
[<c0769a38>] (sock_sendmsg) from [<c076a9b0>] (___sys_sendmsg+0x22c/0x248)
[<c076a9b0>] (___sys_sendmsg) from [<c076b5f8>] (__sys_sendmsg+0x40/0x6c)
[<c076b5f8>] (__sys_sendmsg) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xedec5fa8 to 0xedec5ff0)
...


Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland