Re: 2.6.16-git4: kernel BUG at block/ll_rw_blk.c:3497

From: Mark Lord
Date: Wed Mar 29 2006 - 08:50:00 EST


Al Viro wrote:
On Wed, Mar 29, 2006 at 10:16:43AM +0200, Jens Axboe wrote:
triggering. What sort of testing were you running, exactly?

It's a dual 1GHz-P3 SMP test box, with three SATA drives.
Each drive has two partitions, and /dev/md0 was RAID5
over the first partitions of each drive (no spares),
and /dev/md1 was RAID over the second partitions (no spares).

Both /dev/md[01] were formatted as ext2, and mounted,
and several processes were running, copying directory trees
back and forth between the two RAIDs, while the MD layer was still doing resyncs underneath it all.

Basically, trying really hard to stress everything.

I really wonder why it's the call from do_exit() that triggers it.
The thing is, we get off-by-exactly-one here and all previous callers
of that puppy would be elsewhere (cfq, mostly).

IOW, we get exactly one extra call of put_io_context() _and_ have it
happen before do_exit() (i.e. from normal IO paths). Interesting...

Is there any way to reproduce it without too much PITA?

It's only happened the once, so far. Tell me how you want the code
instrumented, and I'll do it, in case I manage to get it to happen again.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/