Re: [RFC] hv_netvsc: safer orderly shutdown

From: Vitaly Kuznetsov
Date: Fri Nov 10 2017 - 05:42:50 EST


Stephen Hemminger <stephen@xxxxxxxxxxxxxxxxxx> writes:

> Several types of control operations require that the underlying RNDIS
> infrastructure be restarted. This patch changes the ordering of the
> shutdown to avoid race conditions.
> Stop all transmits before doing RNDIS halt. This involves stopping the
> network device transmit queues, then waiting for all outstanding
> sends before informing host to halt.
>
> Also, check for successful restart of the device when after the
> change is done.
>
> For review, not tested on Hyper-V yet.
>
> Signed-off-by: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>
> ---
> drivers/net/hyperv/netvsc_drv.c | 40 ++++++++++++++++++++++++++++++---------
> drivers/net/hyperv/rndis_filter.c | 23 +++++++++++-----------
> 2 files changed, 42 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index da216ca4f2b2..3afa082e093d 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -855,8 +855,10 @@ static int netvsc_set_channels(struct net_device *net,
>
> orig = nvdev->num_chn;
> was_opened = rndis_filter_opened(nvdev);
> - if (was_opened)
> + if (was_opened) {
> + netif_tx_disable(net);
> rndis_filter_close(nvdev);
> + }

I was also experimenting with this and I think it may also make sense to
add napi_disable() for all queues here.

It also seems that the usual TX disable pattern is

netif_tx_stop_all_queues(net);
netif_tx_disable(net);

not sure why..

>
> memset(&device_info, 0, sizeof(device_info));
> device_info.num_chn = count;
> @@ -881,8 +883,13 @@ static int netvsc_set_channels(struct net_device *net,
> }
> }
>
> - if (was_opened)
> - rndis_filter_open(nvdev);
> + if (was_opened) {
> + ret = rndis_filter_open(nvdev);
> + if (ret)
> + netdev_err(net, "reopening device failed: %d\n", ret);
> + else
> + netif_tx_start_all_queues(net);
> + }
>
> /* We may have missed link change notifications */
> net_device_ctx->last_reconfig = 0;
> @@ -971,8 +978,10 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
>
> netif_device_detach(ndev);
> was_opened = rndis_filter_opened(nvdev);
> - if (was_opened)
> + if (was_opened) {
> + netif_tx_disable(net);
> rndis_filter_close(nvdev);
> + }

Shall we just nove netif_tx_disable() & friends to rndis_filter_close()?

>
> memset(&device_info, 0, sizeof(device_info));
> device_info.ring_size = ring_size;
> @@ -1004,8 +1013,13 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
> }
> }
>
> - if (was_opened)
> - rndis_filter_open(nvdev);
> + if (was_opened) {
> + ret = rndis_filter_open(nvdev);
> + if (ret)
> + netdev_err(net, "reopening device failed: %d\n", ret);
> + else
> + netif_tx_start_all_queues(net);
> + }

Yea, the main problem here is that we can't do much if we fail, the
device will be completely unusable. That's not something users exepct
doing MTU change...

>
> netif_device_attach(ndev);
>
> @@ -1547,8 +1561,10 @@ static int netvsc_set_ringparam(struct net_device *ndev,
>
> netif_device_detach(ndev);
> was_opened = rndis_filter_opened(nvdev);
> - if (was_opened)
> + if (was_opened) {
> + netif_tx_disable(net);
> rndis_filter_close(nvdev);
> + }
>
> rndis_filter_device_remove(hdev, nvdev);
>
> @@ -1566,8 +1582,14 @@ static int netvsc_set_ringparam(struct net_device *ndev,
> }
> }
>
> - if (was_opened)
> - rndis_filter_open(nvdev);
> + if (was_opened) {
> + ret = rndis_filter_open(nvdev);
> + if (ret)
> + netdev_err(net, "reopening device failed: %d\n", ret);
> + else
> + netif_tx_start_all_queues(net);
> + }
> +
> netif_device_attach(ndev);
>
> /* We may have missed link change notifications */
> diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
> index 0648eebda829..164f5ffe9c50 100644
> --- a/drivers/net/hyperv/rndis_filter.c
> +++ b/drivers/net/hyperv/rndis_filter.c
> @@ -948,11 +948,20 @@ static void rndis_filter_halt_device(struct rndis_device *dev)
> struct net_device_context *net_device_ctx = netdev_priv(dev->ndev);
> struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
>
> + /* tell bottom half that deice is being closed */
> + nvdev->destroy = true;
> +
> + /* Force flag to be ordered before waiting */
> + wmb();
> +
> + /* Wait for all send completions */
> + wait_event(nvdev->wait_drain, netvsc_device_idle(nvdev));
> +
> /* Attempt to do a rndis device halt */
> request = get_rndis_request(dev, RNDIS_MSG_HALT,
> RNDIS_MESSAGE_SIZE(struct rndis_halt_request));
> if (!request)
> - goto cleanup;
> + return;
>
> /* Setup the rndis set */
> halt = &request->request_msg.msg.halt_req;
> @@ -963,17 +972,7 @@ static void rndis_filter_halt_device(struct rndis_device *dev)
>
> dev->state = RNDIS_DEV_UNINITIALIZED;
>
> -cleanup:
> - nvdev->destroy = true;
> -
> - /* Force flag to be ordered before waiting */
> - wmb();
> -
> - /* Wait for all send completions */
> - wait_event(nvdev->wait_drain, netvsc_device_idle(nvdev));
> -
> - if (request)
> - put_rndis_request(dev, request);
> + put_rndis_request(dev, request);
> }
>
> static int rndis_filter_open_device(struct rndis_device *dev)

I'll give this patch a shot to see if anything blows up. Thanks!

--
Vitaly