RE: [PATCH 2/2] Drivers: hv: vmbus: offload the handling of channels to two workqueues

From: KY Srinivasan
Date: Tue Nov 27 2018 - 00:22:32 EST




> -----Original Message-----
> From: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> Sent: Monday, November 26, 2018 11:35 AM
> To: KY Srinivasan <kys@xxxxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx;
> olaf@xxxxxxxxx; apw@xxxxxxxxxxxxx; jasowang@xxxxxxxxxx; Stephen
> Hemminger <sthemmin@xxxxxxxxxxxxx>; Michael Kelley
> <mikelley@xxxxxxxxxxxxx>; vkuznets <vkuznets@xxxxxxxxxx>; Haiyang
> Zhang <haiyangz@xxxxxxxxxxxxx>; stable@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 2/2] Drivers: hv: vmbus: offload the handling of channels
> to two workqueues
>
> On Mon, Nov 26, 2018 at 02:29:57AM +0000, kys@xxxxxxxxxxxxxxxxx wrote:
> > From: Dexuan Cui <decui@xxxxxxxxxxxxx>
> >
> > vmbus_process_offer() mustn't call channel->sc_creation_callback()
> > directly for sub-channels, because sc_creation_callback() ->
> > vmbus_open() may never get the host's response to the
> > OPEN_CHANNEL message (the host may rescind a channel at any time,
> > e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind()
> > may not wake up the vmbus_open() as it's blocked due to a non-zero
> > vmbus_connection.offer_in_progress, and finally we have a deadlock.
> >
> > The above is also true for primary channels, if the related device
> > drivers use sync probing mode by default.
> >
> > And, usually the handling of primary channels and sub-channels can
> > depend on each other, so we should offload them to different
> > workqueues to avoid possible deadlock, e.g. in sync-probing mode,
> > NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() ->
> > rtnl_lock(), and causes deadlock: the former gets the rtnl_lock
> > and waits for all the sub-channels to appear, but the latter
> > can't get the rtnl_lock and this blocks the handling of sub-channels.
> >
> > The patch can fix the multiple-NIC deadlock described above for
> > v3.x kernels (e.g. RHEL 7.x) which don't support async-probing
> > of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing
> > but don't enable async-probing for Hyper-V drivers (yet).
> >
> > The patch can also fix the hang issue in sub-channel's handling described
> > above for all versions of kernels, including v4.19 and v4.20-rc3.
> >
> > So the patch should be applied to all the existing kernels.
> >
> > Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>
> > Cc: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
> > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > ---
> > drivers/hv/channel_mgmt.c | 188 +++++++++++++++++++++++++---------
> ----
> > drivers/hv/connection.c | 24 ++++-
> > drivers/hv/hyperv_vmbus.h | 7 ++
> > include/linux/hyperv.h | 7 ++
> > 4 files changed, 161 insertions(+), 65 deletions(-)
>
> As Sasha pointed out, this patch does not even apply :(

Sorry about that. These patches applied cleanly on my tree (misc-next).
This series is to be applied on top of
patch 0001-Drivers-hv-vmbus-Remove-the-useless-API-vmbus_get_ou.patch
While the patch 0001-Drivers-hv-vmbus-Remove-the-useless-API-vmbus_get_ou.patch
has been committed to the char-misc-testing branch, it is not in the misc-linus branch and
that is the reason for this problem.

Regards,

K. Y
>