Re: Kernel 2.6.38.6 page allocation failure (ixgbe)

From: Yehuda Sadeh Weinraub
Date: Tue May 10 2011 - 10:20:52 EST


On Tue, May 10, 2011 at 7:04 AM, Stefan Majer <stefan.majer@xxxxxxxxx> wrote:
> Hi,
>
> im running 4 nodes with ceph on top of btrfs with a dualport Intel
> X520 10Gb Ethernet Card with the latest 3.3.9 ixgbe driver.
> during benchmarks i get the following stack.
> I can easily reproduce this by simply running rados bench from a fast
> machine using this 4 nodes as ceph cluster.
> We saw this with stock ixgbe driver from 2.6.38.6 and with the latest
> 3.3.9 ixgbe.
> This kernel is tainted because we use fusion-io iodrives as journal
> devices for btrfs.
>
> Any hints to nail this down are welcome.
>
> Greetings Stefan Majer
>
> May 10 15:26:40 os02 kernel: [ 3652.485219] cosd: page allocation
> failure. order:2, mode:0x4020

It looks like the machine running the cosd is crashing, is that the case?
Are you running both ceph kernel module on the same machine by any
chance? If not, it can be some other fs bug (e.g., the underlying
btrfs). Also, the stack here is quite deep, there's a chance for a
stack overflow.

Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html