RE: [PATCH net] hv_netvsc: Fix napi reschedule while receive completion is busy

From: Haiyang Zhang
Date: Mon Jul 09 2018 - 14:33:52 EST




> -----Original Message-----
> From: Stephen Hemminger <stephen@xxxxxxxxxxxxxxxxxx>
> Sent: Monday, July 9, 2018 2:15 PM
> To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxxxxxx>
> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx; olaf@xxxxxxxxx; Stephen Hemminger
> <sthemmin@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> devel@xxxxxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx
> Subject: Re: [PATCH net] hv_netvsc: Fix napi reschedule while receive
> completion is busy
>
> On Mon, 9 Jul 2018 16:43:19 +0000
> Haiyang Zhang <haiyangz@xxxxxxxxxxxxxxxxx> wrote:
>
> > From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> >
> > If out ring is full temporarily and receive completion cannot go out,
> > we may still need to reschedule napi if other conditions are met.
> > Otherwise the napi poll might be stopped forever, and cause network
> > disconnect.
> >
> > Fixes: 7426b1a51803 ("netvsc: optimize receive completions")
> > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > ---
> > drivers/net/hyperv/netvsc.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> > index 8e9d0ee1572b..caaf5054f446 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -1285,14 +1285,14 @@ int netvsc_poll(struct napi_struct *napi, int
> budget)
> > nvchan->desc = hv_pkt_iter_next(channel, nvchan->desc);
> > }
> >
> > - /* If send of pending receive completions suceeded
> > - * and did not exhaust NAPI budget this time
> > + send_recv_completions(ndev, net_device, nvchan);
> > +
> > + /* If it did not exhaust NAPI budget this time
> > * and not doing busy poll
> > * then re-enable host interrupts
> > * and reschedule if ring is not empty.
> > */
> > - if (send_recv_completions(ndev, net_device, nvchan) == 0 &&
> > - work_done < budget &&
> > + if (work_done < budget &&
> > napi_complete_done(napi, work_done) &&
> > hv_end_read(&channel->inbound) &&
> > napi_schedule_prep(napi)) {
>
> This patch doesn't look right. I think the existing code works as written.
>
> If send_receive_completions is unable to send because ring is full then
> vmbus_sendpacket will return -EBUSY which gets returns from
> send_receive_completions. Because the return is non-zero, the driver will not
> call napi_complete_done.
> Since napi_complete_done was not called, NAPI will reschedule the napi poll
> routine.

With the existing code, we found in test, the rx_comp_busy counter increased,
one of the in-ring mask is 1, but guest is not reading it... With this patch, the
pending receive completion will stay in the buffer (no loss), and be sent next time.
It solves the disconnection problem when high number of connections.

If not calling napi_complete_done(), upper layer should guarantee napi_schedule,
then seems the upper NAPI code may have a bug -- the auto scheduling did not
happen in this case. I will check it further.

Thanks,
- Haiyang