What am I doing wrong? submit_bio() suddenly stops working...
From: Theodore Ts'o
Date: Wed Oct 20 2010 - 22:00:41 EST
Hey Jens,
I've been trying to figure out what I'm doing wrong. I've been trying
to convert the data writeback bath to use the bio layer. It mostly
works --- until all of sudden all calls to block_bio_queue(), either via
submit_bh() or via submit_bio(), start turning into no-ops.
I'm sure I'm doing something wrong, but the bio layer isn't terribly
well documented, so I'm not sure what it might be. The patch which
causes the problem can be found be found here:
http://userweb.kernel.org/~tytso/ext4-bio-patches/0006-Ext4-Use-bio-layer-instead-of-buffer-layer-in-mpage_.patch
Here is an except from an ftrace I've been taking to get to the bottom
of it. It's a combination of some trace_printk's, blktrace, and the
block_bio_queue tracepoint. The full log can be found at:
http://userweb.kernel.org/~tytso/ext4-bio-patches/kvm-console
It shows all of the blktrace events that shows up after block_bio_queue
tracepoint, but at some point, after jbd2 or ext4 calls submit_bh() or
submit_bio(), after the block_bio_queue tracepoint, we stop seeing the
blktrace events, and it looks like the block I/O layer stops answering
the phone. No complaints in dmesg, no BUG_ON's, no errors....
If I back out the ext4 bio patches, things work correctly, and as I
said, I'm pretty sure the bug is in my code. But the failure is
happening deep in the block I/O stack, and I can't figure out why it's
failing.
I'm hoping this rings a bell, and perhaps we should consider some of the
debugging trace_printk's as possible new tracepoints?
Any help you could give me would be greatly appreciated. Ideally, you
or someone can tell me what stupid thing I'm doing. :-)
Thanks,
- Ted
[ 215.663499] jbd2/sdb-2126 0..... 6976406us : submit_bh: <5>dev sdb blk 131072, ret_pc c0211ecb (__sync_dirty_buffer) pid 2126
[ 215.663499] jbd2/sdb-2126 0d.... 6976415us : 8,16 Q WS 1048576 + 8 [jbd2/sdb-8]
[ 215.663499] jbd2/sdb-2126 0..... 6976419us : block_bio_queue: 8,16 WS 1048576 + 8 [jbd2/sdb-8]
[ 215.663499] jbd2/sdb-2126 0d.... 6976439us : 8,16 m N cfq2126 alloced
[ 215.663499] jbd2/sdb-2126 0d.... 6976440us : 8,16 G WS 1048576 + 8 [jbd2/sdb-8]
[ 215.663499] jbd2/sdb-2126 0d.... 6976445us : 8,16 I W 1048576 + 8 [jbd2/sdb-8]
[ 215.663499] jbd2/sdb-2126 0d.... 6976446us : 8,16 m N cfq2126 insert_request
[ 215.663499] jbd2/sdb-2126 0d.... 6976448us : 8,16 m N cfq2126 add_to_rr
[ 215.663499] jbd2/sdb-2126 0d.... 6976451us : 8,16 m N cfq2126 preempt
[ 215.663499] jbd2/sdb-2126 0d.... 6976452us : 8,16 m N cfq2993 slice expired t=1
[ 215.663499] jbd2/sdb-2126 0d.... 6976453us : 8,16 m N cfq2993 resid=-331
[ 215.663499] jbd2/sdb-2126 0d.... 6976456us : 8,16 m N cfq2993 sl_used=12 disp=2 charge=12 iops=0 sect=296
[ 215.663499] jbd2/sdb-2126 0d.... 6976788us : 8,16 m N cfq workload slice:45
[ 215.663499] jbd2/sdb-2126 0d.... 6976791us : 8,16 m N cfq2126 set_active wl_prio:0 wl_type:2
[ 215.663499] jbd2/sdb-2126 0..... 6976797us : __wait_on_buffer: <5>dev sdb blk 131072, ret_pc c0210e83 (wait_on_buffer) pid 2126
[ 215.663499] jbd2/sdb-2126 0d.... 6976801us : 8,16 U N [jbd2/sdb-8] 10
[ 215.663499] jbd2/sda-1439 0..... 10966709us : submit_bh: <5>dev sda blk 128401, ret_pc c02128fb (__block_write_full_page) pid 1439
[ 215.663499] jbd2/sda-1439 0..... 10966719us : block_bio_queue: 8,0 WS 1027208 + 8 [jbd2/sda-8]
[ 215.663499] jbd2/sda-1439 0..... 10966750us : submit_bh: <5>dev sda blk 128402, ret_pc c02128fb (__block_write_full_page) pid 1439
...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/