Re: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues

From: gregkh@xxxxxxxxxxxxxxxxxxx
Date: Thu Nov 29 2018 - 02:44:41 EST


On Thu, Nov 29, 2018 at 04:36:43AM +0000, Dexuan Cui wrote:
>
> vmbus_process_offer() mustn't call channel->sc_creation_callback()
> directly for sub-channels, because sc_creation_callback() ->
> vmbus_open() may never get the host's response to the
> OPEN_CHANNEL message (the host may rescind a channel at any time,
> e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind()
> may not wake up the vmbus_open() as it's blocked due to a non-zero
> vmbus_connection.offer_in_progress, and finally we have a deadlock.
>
> The above is also true for primary channels, if the related device
> drivers use sync probing mode by default.
>
> And, usually the handling of primary channels and sub-channels can
> depend on each other, so we should offload them to different
> workqueues to avoid possible deadlock, e.g. in sync-probing mode,
> NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() ->
> rtnl_lock(), and causes deadlock: the former gets the rtnl_lock
> and waits for all the sub-channels to appear, but the latter
> can't get the rtnl_lock and this blocks the handling of sub-channels.
>
> The patch can fix the multiple-NIC deadlock described above for
> v3.x kernels (e.g. RHEL 7.x) which don't support async-probing
> of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing
> but don't enable async-probing for Hyper-V drivers (yet).
>
> The patch can also fix the hang issue in sub-channel's handling described
> above for all versions of kernels, including v4.19 and v4.20-rc4.
>
> So, actually the patch should be applied to all the existing kernels,
> not only the kernels that have 8195b1396ec8.
>
> Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>
> Cc: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
> Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> ---
>
> There is no change in this repost. I just rebased this patch to today's
> char-misc's char-misc-next branch. Previously KY posted the patch with his
> Signed-off-by (which is kept in this repost), but there was a conflict issue.
>
> Note: the patch can't be cleanly applied to char-misc's char-misc-linus branch --
> to do that, we need to cherry-pick the supporting patch first:
> 4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API vmbus_get_outgoing_channel()")

That is not going to work for the obvious reason that this dependant
patch is not going to be merged into 4.20-final.

So, what do you expect us to do here? The only way this can be accepted
is to have it go into my -next branch, which means it will show up in
4.21-rc1, is that ok?

But then, if that happens, it will fail to apply to any stable tree for
4.20 and older, like you are asking it to be done for.

So what do you expect me to do here with this?

totally confused,

greg k-h