Re: 2.6.17-mm6

From: Reuben Farrelly
Date: Fri Jul 07 2006 - 17:13:32 EST

Next message: Matt Helsley: "Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking"
Previous message: Jeff Garzik: "Re: 2.6.17-mm6"
In reply to: Andrew Morton: "Re: 2.6.17-mm6"
Next in thread: Andrew Morton: "Re: 2.6.17-mm6"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 7/07/2006 9:35 p.m., Andrew Morton wrote:

On Fri, 07 Jul 2006 21:17:03 +1200
Reuben Farrelly <reuben-lkml@xxxxxxxx> wrote:

On 3/07/2006 10:03 p.m., Andrew Morton wrote:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm6/

- A major update to the e1000 driver.

- 1394 updates
This release has been working quite well after some initial hiccups (see -mm hotfix), but a couple of hours ago another bad thing (tm) happened.

At the time I was moving 100G of data from one partition on a non-raid partition to a new RAID-1 md that I had created earlier. Both are ext3 partitions.

The word "barrier" of course comes to mind again, I'm not sure NeilB is the culprit this time either but I've cc'd him in just in case.

The file copy went on happily for quite a while (maybe 10 mins or so) under very high IO load before blowing up as below. The terminal was spewing out constant traces but hopefully the right ones are here as these are the first few (if not, I have copied a bit more).

Yes, the very first oops is by far the most important to capture.

You can add pause_on_oops=100000 to the kernel boot command line to make
the machine freeze after outputting the first oops. That'll certainly
prevent the oops messages from getting to the log files, but it will also
prevent it from scolling away or swamping your log device.

sh-3.1# mv /store-old/* /store/
Unable to handle kernel paging request at ffff81043e345490 RIP:
[<ffffffff802620f2>] memcpy+0x12/0xb0
PGD 8063 PUD 0
Oops: 0000 [1] SMP
last sysfs file: /kernel/uevent_seqnum
CPU 1
Modules linked in: binfmt_misc ide_cd iTCO_wdt i2c_i801 cdrom serio_raw ide_disk
Pid: 165, comm: pdflush Not tainted 2.6.17-mm6 #2
RIP: 0010:[<ffffffff802620f2>] [<ffffffff802620f2>] memcpy+0x12/0xb0
RSP: 0018:ffff81003ed31828 EFLAGS: 00010002
RAX: ffff810001faec18 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000080 RSI: ffff81043e345490 RDI: ffff810001faec18
RBP: ffff81003ed31898 R08: ffff81003f66a800 R09: 0000000000000000
R10: ffff810029b364b0 R11: 0000000000000000 R12: ffff810001faec00
R13: ffff81003f6ff140 R14: ffff81003f6ea240 R15: 0000000000000010
FS: 0000000000000000(0000) GS:ffff810037ffe440(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffff81043e345490 CR3: 000000003dc32000 CR4: 00000000000006e0
Process pdflush (pid: 165, threadinfo ffff81003ed30000, task ffff810037f2b840)
Stack: 0000000000000010 ffffffff8025e9f0 ffff81003f66a800 0001120000000000
0000000000011200 000112003f650080 ffff81003eca39f0 ffff81003ed318c8
ffff81003ed31898 0000000000000246 ffff81003f6ff140 0000000000011200
Call Trace:
[<ffffffff8025e9f0>] cache_alloc_refill+0xc9/0x538
[<ffffffff802b96c6>] __kmalloc+0x86/0x96
[<ffffffff802ae460>] __kzalloc+0xf/0x2f
[<ffffffff8041d598>] r1bio_pool_alloc+0x21/0x3a
[<ffffffff802230e4>] mempool_alloc+0x44/0xfb

The core slab data structures were wrecked. For kmalloc(), no less. Something secretly destroyed your kernel, and it could be anything. Nice.

Having now turned on slab debugging, is it possibly related to this message which appeared in my log when I booting up earlier?

Jul 8 02:49:39 tornado kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 8 02:49:39 tornado kernel: Adding 497972k swap on /dev/sdc9. Priority:-1 extents:1 across:497972k
Jul 8 02:49:40 tornado kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Jul 8 02:49:40 tornado kernel: Netfilter messages via NETLINK v0.30.
Jul 8 02:49:40 tornado kernel: ip_conntrack version 2.4 (4060 buckets, 32480 max) - 288 bytes per conntrack
Jul 8 02:49:40 tornado kernel: Slab corruption: start=ffff81003efd7000, len=4096
Jul 8 02:49:40 tornado kernel: 170: ff ff ff ff 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b
Jul 8 02:49:40 tornado kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
Jul 8 02:49:40 tornado kernel: GRE over IPv4 tunneling driver

reuben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Matt Helsley: "Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking"
Previous message: Jeff Garzik: "Re: 2.6.17-mm6"
In reply to: Andrew Morton: "Re: 2.6.17-mm6"
Next in thread: Andrew Morton: "Re: 2.6.17-mm6"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]