Re: Crash in __netif_receive_skb

From: Avleen Vig
Date: Wed Apr 04 2012 - 15:35:40 EST

Next message: Tom Goff: "[PATCH] sysfs: Update the name hash for an entry after changing the namespace"
Previous message: Tejun Heo: "Re: [RFC] writeback and cgroup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Mar 29, 2012 at 4:42 PM, Avleen Vig <avleen@xxxxxxxxx> wrote:
> On Wed, Mar 28, 2012 at 10:01 PM, Avleen Vig <avleen@xxxxxxxxx> wrote:
>> On Wed, Mar 28, 2012 at 8:31 PM, Avleen Vig <avleen@xxxxxxxxx> wrote:
>>> Hi folks, someone in #kernel recommended I email these two lists. Hope
>>> they're the right place.
>>>
>>> We're running 2.6.32-220.4.1.el6.x86_64 on Centos 6.2, and getting a
>>> repeated crash:
>>> https://gist.github.com/2231998
>>>
>>> We can make this happen pretty easily just by passing some network
>>> traffic and waiting a while.
>>> I couldn't find any references to this particular issue.
>>> I have vmcore files and am happy to dig into it if it would help (as
>>> long as someone can tell me what to do :))
>>
>> I hope this debugging is legit, I'm really new to this level of insight.
>>
>> I think the problem is in include/linux/netpoll.h, at the "if"
>> statement at line 86:
>> static inline int netpoll_receive_skb(struct sk_buff *skb)
>> {
>> if (!list_empty(&skb->dev->napi_list))
>> return netpoll_rx(skb);
>> return 0;
>> }
>>
>>
>> This is based on poking around in the crash dump:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
>> IP: [<ffffffff8142bb40>] __netif_receive_skb+0x60/0x6e0
>> crash> dis -rl ffffffff8142bb40
>> ....
>> /usr/src/debug/kernel-2.6.32-220.7.1.el6/linux-2.6.32-220.7.1.el6.x86_64/include/linux/netpoll.h:
>> 86
>> 0xffffffff8142bb33 <__netif_receive_skb+83>: mov 0x20(%rbx),%r12
>> 0xffffffff8142bb37 <__netif_receive_skb+87>: mov %r12,-0x38(%rbp)
>> 0xffffffff8142bb3b <__netif_receive_skb+91>: lea 0x60(%r12),%rax
>> 0xffffffff8142bb40 <__netif_receive_skb+96>: cmp %rax,0x60(%r12)
>>
>>
>>
>> I *think* this means that "&skb->dev->napi_list" is null when we're
>> trying to compare it, rather than being a list.
>>
>> If it matters, this is inside LXC containers.
>
> We've traced this to a problem with machines that have multiple hard
> drives AND have NAPI enabled for the NIC driver.
>
> We recompiled the e1000e driver with NAPI disabled, with:
> make CFLAGS_EXTRA=-DE1000E_NO_NAPI
>
> and everything works great now.

Untrue! This eventually failed too, but after a lot more debugging, we
think we've nailed it:
Multicast

We use ganglia on all of our nodes (and we were setting it up inside
the LXC containers), and ganglia listens / sends on multicast.

When gmond was starting inside the containers, it was giving an error:
Apr 4 17:00:42 hostname /usr/sbin/gmond[551]: Error creating
multicast server mcast_join=239.2.11.110 port=8649 mcast_if=NULL
family='inet4'. Exiting.

It occurred to me that if the kernel is trying to read a multicast
packet from a socket buffer, but the container can't handle multicast
(mcast_if=NULL), that would explain why we got a NULL pointer
dereference in &skb->dev->napi_list.
(at least, this makes sense in my head.)

We disabled gmond in the containers, so nothing should be listening
for multicast packets, and everything is stable again.
Note that this ONLY seems to happen for us when we're using the
onboard 82574L Intel NIC with the e1000e driver.
We have other servers with the 82575 NIC which uses the igb driver and
doesn't exhibit this problem.

Is anyone from Intel here who could look in to this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tom Goff: "[PATCH] sysfs: Update the name hash for an entry after changing the namespace"
Previous message: Tejun Heo: "Re: [RFC] writeback and cgroup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]