Re: [PATCH] netpoll can lock up on low memory.

From: Ingo Molnar
Date: Sat Aug 06 2005 - 02:45:53 EST



* Andi Kleen <ak@xxxxxxx> wrote:

> On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
> > The netpoll philosophy is to assume that its traffic is an absolute
> > priority - it is better to potentially hang trying to deliver a panic
> > message than to give up and crash silently.
>
> That would be ok if netpoll was only used to deliver panics. But it is
> not. It delivers all messages, and you cannot hang the kernel during
> that. Actually even for panics it is wrong, because often it is more
> important to reboot in a panic than (with a panic timeout) to actually
> deliver the panic. That's needed e.g. in a failover cluster.

without going into the merits of this discussion, reliable failover
clusters must include (and do include) an external ability to cut power.
No amount of in-kernel logic will prevent the kernel from hanging, given
a bad enough kernel bug.

So the right question is not 'can we prevent the kernel from hanging,
ever' (we cannot), but 'which change makes it less likely for the kernel
to hang'. (and, obviously: assuming all other kernel components are
functioning per specification, netpoll itself most not hang :-)

even a plain printk to VGA can hang in certain kernel crashes. Netpoll
is more complex and thus has more exposure to hangs. E.g. netpoll relies
on the network driver to correctly recycle skbs within a bound amount of
time. If the network driver leaks skbs, it's game over for netpoll.

[ i'd prefer a hang over nondeterministic behavior, and e.g. losing
console messages is sure nondeterministic behavior. What if the
console message is "WARNING: the box has just been broken into"? ]

we could do one thing (see the patch below): i think it would be useful
to fill up the netlogging skb queue straight at initialization time.
Especially if netpoll is used for dumping alone, the system might not be
in a situation to fill up the queue at the point of crash, so better be
a bit more prepared and keep the pipeline filled.

Ingo

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

--- net/core/netpoll.c.orig
+++ net/core/netpoll.c
@@ -720,6 +720,8 @@ int netpoll_setup(struct netpoll *np)
}
/* last thing to do is link it to the net device structure */
ndev->npinfo = npinfo;
+ /* fill up the skb queue */
+ refill_skbs();

return 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/