Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM

From: Boris Ostrovsky
Date: Sun Jan 29 2017 - 18:11:11 EST

On 01/19/2017 11:35 AM, Vineeth Remanan Pillai wrote:
From: Vineeth Remanan Pillai <vineethp@xxxxxxxxxx>

During an OOM scenario, request slots could not be created as skb
allocation fails. So the netback cannot pass in packets and netfront
wrongly assumes that there is no more work to be done and it disables
polling. This causes Rx to stall.

The issue is with the retry logic which schedules the timer if the
created slots are less than NET_RX_SLOTS_MIN. The count of new request
slots to be pushed are calculated as a difference between new req_prod
and rsp_cons which could be more than the actual slots, if there are
unconsumed responses.

The fix is to calculate the count of newly created slots as the
difference between new req_prod and old req_prod.

Signed-off-by: Vineeth Remanan Pillai <vineethp@xxxxxxxxxx>
Reviewed-by: Juergen Gross <jgross@xxxxxxxx>
Changes in v2:
- Removed the old implementation of enabling polling on
skb allocation error.
- Corrected the refill timer logic to schedule when newly
created slots since last push is less than NET_RX_SLOTS_MIN.

There are couple of problems with this patch.
1. The 'if' clause now evaluates to true on pretty much every call to xennet_alloc_rx_buffers().
2. It tickles a latent bug during resume where the timer triggers before we re-connect. The trouble is that we now try to dereference queue->rx.sring which is NULL since we disconnect in netfront_resume(). (Curiously, I only observe it with 32-bit guests)

I'll send a patch later that will delete the timer since it looks like a bug to me in any case but the first problem seems to be more serious than the problem that this patch addresses.


drivers/net/xen-netfront.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 40f26b6..2c7c29f 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -321,7 +321,7 @@ static void xennet_alloc_rx_buffers(struct
netfront_queue *queue)
queue->rx.req_prod_pvt = req_prod;

/* Not enough requests? Try again later. */
- if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
+ if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {
mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));