Re: Crash during SATA reads

From: Jeff Garzik
Date: Wed Nov 11 2009 - 04:27:02 EST


On 11/11/2009 04:18 AM, Glenn Maynard wrote:
Pid: 1311, comm: gzip Not tainted (2.6.31.6 #1) G31M-ES2L
EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
EIP is at 0x0
EAX: c1ae78c0 EBX: c107cca9 ECX: c1ae78c0 EDX: 00000000
ESI: c1ae78c0 EDI: dfa3b2c0 EBP: df29bed0 ESP: df29be94
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process gzip (pid: 1311, ti=df29a000 task=df916800 task.ti=df29a000)
Stack:
c107ccb1 c107e549 00000200 c11587bc 00000000 00000001 df430a94 00000000
<0> 00000000 00005000 0000b000 00000000 dfa3b2c0 00000000 dfa3b2c0 dfa0f168
<0> c1158936 dfa20980 00000000 c11589ed 00000000 dfa20980 00000000 dfa3b2c0
Call Trace:
[<c107ccb1>] ? end_bio_bh_io_sync+0x28/0x30
[<c107e549>] ? bio_endio+0x24/0x26
[<c11587bc>] ? blk_update_request+0xdf/0x24e
[<c1158936>] ? blk_update_bidi_request+0xb/0x41
[<c11589ed>] ? blk_end_bidi_request+0x10/0x4f
[<c1158a5c>] ? blk_end_request+0x7/0xc
[<c11abcb2>] ? scsi_end_request+0x17/0x69
[<c11abfc3>] ? scsi_io_completion+0x173/0x335
[<c11a8330>] ? scsi_finish_command+0x70/0x86
[<c11ac6a6>] ? scsi_softirq_done+0xd7/0xdc
[<c115b3f1>] ? blk_done_softirq+0x51/0x5d
[<c101bde0>] ? __do_softirq+0x5f/0xc8
[<c101be6b>] ? do_softirq+0x22/0x26
[<c101becd>] ? irq_exit+0x29/0x34
[<c1004097>] ? do_IRQ+0x53/0x63
[<c1002ea9>] ? common_interrupt+0x29/0x30
Code: Bad EIP value.
EIP: [<00000000>] 0x0 SS:ESP 0068:df29be94
CR2: 0000000000000000
---[ end trace 79f49d6371afc159 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 1311, comm: gzip Tainted: G D 2.6.31.6 #1
Call Trace:
[<c101824c>] ? panic+0x41/0xde
[<c1004dcf>] ? oops_end+0x5c/0x66
[<c107cca9>] ? end_bio_bh_io_sync+0x20/0x30
[<c100eca7>] ? bad_area_nosemaphore+0xa/0xc
[<c126564e>] ? error_code+0x5e/0x64
[<c107cca9>] ? end_bio_bh_io_sync+0x20/0x30
[<c107007b>] ? file_update_time+0x8c/0xd8
[<c100ee87>] ? do_page_fault+0x0/0x1f9
[<c107ccb1>] ? end_bio_bh_io_sync+0x28/0x30
[<c107e549>] ? bio_endio+0x24/0x26
[<c11587bc>] ? blk_update_request+0xdf/0x24e
[<c1158936>] ? blk_update_bidi_request+0xb/0x41
[<c11589ed>] ? blk_end_bidi_request+0x10/0x4f

Looks like it is dying somewhere in the block layer, maybe the bh->b_end_io() pointer is NULL.

Does the attached patch trigger the added BUG_ON() statement?

Jeff



diff --git a/fs/buffer.c b/fs/buffer.c
index 6fa5302..267e4d1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2939,6 +2939,7 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
if (unlikely (test_bit(BIO_QUIET,&bio->bi_flags)))
set_bit(BH_Quiet, &bh->b_state);

+ BUG_ON(bh->b_end_io == NULL);
bh->b_end_io(bh, test_bit(BIO_UPTODATE, &bio->bi_flags));
bio_put(bio);
}