Re: Revert "gro: Fix legacy path napi_complete crash",

From: David Miller
Date: Tue Mar 24 2009 - 17:37:05 EST


From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Date: Tue, 24 Mar 2009 23:09:28 +0800

> On Tue, Mar 24, 2009 at 03:39:42PM +0100, Ingo Molnar wrote:
> >
> > Subject: [PATCH] net: Fix netpoll lockup in legacy receive path
>
> Actually, this patch is still racy. If some interrupt comes in
> and we suddenly get the maximum amount of backlog we can still
> hang when we call __napi_complete incorrectly. It's unlikely
> but we certainly shouldn't allow that. Here's a better version.
>
> net: Fix netpoll lockup in legacy receive path

Hmmm...

> @@ -2588,9 +2588,10 @@ static int process_backlog(struct napi_struct *napi, int quota)
> local_irq_disable();
> skb = __skb_dequeue(&queue->input_pkt_queue);
> if (!skb) {
> + list_del(&napi->poll_list);
> + clear_bit(NAPI_STATE_SCHED, &napi->state);
> local_irq_enable();
> - napi_complete(napi);
> - goto out;
> + break;
> }
> local_irq_enable();

I think the problem is that we need to do the GRO flush before the
list delete and clearing the NAPI_STATE_SCHED bit.

You can't disown the NAPI context until you've squared away the GRO
state, I think.

Ingo's case stresses TCP a lot so I think he's hitting these GRO
cases a lot as well as hitting the backlog maximum.

So this mis-ordering of completion operations could explain why
he still sees problems.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/