I/O scheduler on its knees, pvmove hangs other tasks

From: martin f krafft
Date: Sat Oct 01 2011 - 05:29:13 EST

Dear list,

I have a system with a LVM VG on a RAID6 (md). I need to decrease
the size of the md to make room for another partition. Therefore,
I employ pvmove with the anywhere allocation strategy to move the
physical extents at the end of the disk to the front.

In order to not disturb the system too much, I use ionice -c3 to put
both, pvmove and the associated kcopyd process into the idle I/O
scheduling class. The disk block devices use the cfq scheduler.

The problem is that the kernel seems not to care. The pvmove process
happily chugs along at what seems to be maximum speed. At the same
time, other processes, like /bin/ls simply block and remain in 'D'
state *forever*. Even after the pvmove process exits, these
processes do not recover.

The kernel tells me about the hung tasks, but I cannot really make
sense out of this information.

INFO: task ls:4870 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls D 0000000000000002 0 4870 4667 0x00000000
ffff880037814db0 0000000000000086 ffff880079b89a40 0000000000000000
0000000000000001 0000000000000296 000000000000f9e0 ffff880079ca1fd8
0000000000015780 0000000000015780 ffff88007a4662e0 ffff88007a4665d8
Call Trace:
[<ffffffffa048c563>] ? put_ldev+0x2d/0x6f [drbd]
[<ffffffffa01112c4>] ? dm_table_unplug_all+0x4b/0xb4 [dm_mod]
[<ffffffff8101657d>] ? read_tsc+0xa/0x20
[<ffffffff8117e023>] ? generic_make_request+0x299/0x2f9
[<ffffffff8110e146>] ? sync_buffer+0x0/0x40
[<ffffffff812fafca>] ? io_schedule+0x73/0xb7
[<ffffffff8110e181>] ? sync_buffer+0x3b/0x40
[<ffffffff812fb4d7>] ? __wait_on_bit+0x41/0x70
[<ffffffff8110e146>] ? sync_buffer+0x0/0x40
[<ffffffff812fb571>] ? out_of_line_wait_on_bit+0x6b/0x77
[<ffffffff81064f44>] ? wake_bit_function+0x0/0x23
[<ffffffffa0156a09>] ? __ext4_get_inode_loc+0x2d8/0x32e [ext4]
[<ffffffff81100b21>] ? inode_init_always+0x109/0x1aa
[<ffffffffa0156f9f>] ? ext4_iget+0x5a/0x6ed [ext4]
[<ffffffffa015f911>] ? ext4_lookup+0x83/0xe1 [ext4]
[<ffffffff810f64c3>] ? do_lookup+0xd3/0x15d
[<ffffffff810f6ef0>] ? __link_path_walk+0x5a5/0x6f5
[<ffffffff810b41a5>] ? lock_page+0x9/0x1f
[<ffffffff810f726e>] ? path_walk+0x66/0xc9
[<ffffffff810f86d8>] ? do_path_lookup+0x20/0x77
[<ffffffff810f9bba>] ? user_path_at+0x48/0x79
[<ffffffff810ccb2f>] ? handle_mm_fault+0x3b8/0x80f
[<ffffffff810d018d>] ? get_unmapped_area+0xd7/0x139
[<ffffffff810f2006>] ? vfs_fstatat+0x2c/0x57
[<ffffffff810f2087>] ? sys_newlstat+0x11/0x30
[<ffffffff812fe760>] ? do_page_fault+0x2da/0x2f2
[<ffffffff812fc605>] ? page_fault+0x25/0x30
[<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

Any input appreciated, specifically answers to the following

- how can I fix I/O scheduling so that pvmove/kcopyd and the rest
of the system share nicely?

- how can I revive these 'D'-processes?

This is on Debian squeeze, amd64, with a 2.6.32 kernel and the cfq
scheduler. The underlying disks are WD Caviar Green (I know they
suck at performanceâ), and the partitions are currently not aligned
to 4k sectors â which is what I am trying to fix.


martin | http://madduck.net/ | http://two.sentenc.es/

"da haben wir es also: eine kirchliche ordnung mit priesterschaft,
theologie, kultus, sakrament;
kurz, alles das, was jesus von nazareth bekÃmpft hatte..."
- friedrich nietzsche

spamtraps: madduck.bogus@xxxxxxxxxxx

Attachment: digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current)