Re: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts

From: Stephen Hemminger
Date: Tue Jan 23 2018 - 11:33:43 EST


On Tue, 23 Jan 2018 10:34:03 +0100
Mohammed Gamal <mgamal@xxxxxxxxxx> wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") introduced
> a regression that caused VMs not to shutdown after netvsc_device_remove() is
> called. This is caused by GPADL teardown sequence change, and while that was
> necessary to fix issues with Win2016 hosts, it did introduce a regression for
> earlier versions.
>
> Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was as
> follows (as implemented in netvsc_destroy_buf()):
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Teardown receive buffer GPADL
> 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 4- Teardown send buffer GPADL
> 5- Close vmbus
>
> This didn't work for WS2016 hosts. Commit 0cf737808 split netvsc_destroy_buf()
> into two functions and rearranged the order as follows
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 3- Close vmbus
> 4- Teardown receive buffer GPADL
> 5- Teardown send buffer GPADL
>
> That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs from
> shutting down.
>
> This patch series works around this problem. The first patch splits
> netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
> functions for tearing down send and receive buffers individally. The second patch
> uses the finer grained functions to implement the teardown sequence according to
> the host's version. We keep the behavior introduced in 0cf737808ae7 for Windows
> 2016 hosts, while we re-introduce the old sequence for earlier verions.
>
> Mohammed Gamal (2):
> hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
> hv_netvsc: Change GPADL teardown order according to Hyper-V version
>
> drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 42 insertions(+), 8 deletions(-)
>

The problem the original commit was trying to solve was actions in flight
in the receive buffer on shutdown. Having different ordering for each version of Hyper-V
seems unnecessary. There should be a way to get a stable sequence here.

Let me see if I can shake more information out of the Windows team to see what
the handshake on the other side is. Let's not apply this until then.