pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

From: Florin Iucha
Date: Sun Oct 28 2007 - 11:24:41 EST


Hello,

For a week or two I started noticing that some time after I'm logged
in, my keyboard input becomes a bit staggering, there is a small delay
between the keypress and the actual character appearing in the
terminal. This is on a AMD Athlon x2 4200+ with 2 GB RAM and just a
gnome-terminal open. The machine is as idle as possible - monitored
via the system monitor applet. I could not get any hard data on it,
until now.

After I logged off from GNOME, I switched to the text console and ran
top, with the option of showing one CPU stats line for each CPU. Lo
and behold, one core is 100% idle, and the other one is 25% idle and
75% waiting. Periodically, a pdflush process in 'D' state raises to
the top. I did a 'echo t > /proc/sysrequest-trigger' and this is what
is says for the two pdflush processes:

[ 3687.824424] pdflush S ffff8100057ffef8 0 247 2
[ 3687.824427] ffff8100057ffed0 0000000000000046 ffff8100057ffe70 ffffffff8022a96c
[ 3687.824431] ffff8100057fc000 ffff810003040770 ffff8100057fc208 0000000000000297
[ 3687.824434] ffff8100057ffe90 ffff810002c1ba10 ffff8100057ffed0 ffffffff8022b9d2
[ 3687.824438] Call Trace:
[ 3687.824440] [<ffffffff8022a96c>] enqueue_task_fair+0x21/0x34
[ 3687.824444] [<ffffffff8022b9d2>] set_user_nice+0x110/0x12c
[ 3687.824448] [<ffffffff80267165>] pdflush+0x0/0x1c3
[ 3687.824451] [<ffffffff80267234>] pdflush+0xcf/0x1c3
[ 3687.824455] [<ffffffff80245876>] kthread+0x49/0x77
[ 3687.824458] [<ffffffff8020c598>] child_rip+0xa/0x12
[ 3687.824463] [<ffffffff8024582d>] kthread+0x0/0x77
[ 3687.824466] [<ffffffff8020c58e>] child_rip+0x0/0x12
[ 3687.824468]
[ 3687.824470] pdflush D ffffffff805787c0 0 248 2
[ 3687.824473] ffff810006001d90 0000000000000046 0000000000000000 0000000000000286
[ 3687.824476] ffff8100057fc770 ffff810003062000 ffff8100057fc978 0000000106001da0
[ 3687.824480] 0000000000000003 ffffffff8023b1b2 0000000000000000 0000000000000000
[ 3687.824483] Call Trace:
[ 3687.824488] [<ffffffff8023b1b2>] __mod_timer+0xb8/0xca
[ 3687.824492] [<ffffffff8055c87a>] schedule_timeout+0x8d/0xb4
[ 3687.824496] [<ffffffff8023ad6c>] process_timeout+0x0/0xb
[ 3687.824499] [<ffffffff8055c79a>] io_schedule_timeout+0x28/0x33
[ 3687.824503] [<ffffffff8026bb24>] congestion_wait+0x6b/0x87
[ 3687.824506] [<ffffffff80245983>] autoremove_wake_function+0x0/0x38
[ 3687.824510] [<ffffffff8029e684>] writeback_inodes+0xcd/0xd5
[ 3687.824514] [<ffffffff80266dc4>] wb_kupdate+0xbb/0x10d
[ 3687.824518] [<ffffffff80267165>] pdflush+0x0/0x1c3
[ 3687.824520] [<ffffffff8026727d>] pdflush+0x118/0x1c3
[ 3687.824523] [<ffffffff80266d09>] wb_kupdate+0x0/0x10d
[ 3687.824527] [<ffffffff80245876>] kthread+0x49/0x77
[ 3687.824530] [<ffffffff8020c598>] child_rip+0xa/0x12
[ 3687.824535] [<ffffffff8024582d>] kthread+0x0/0x77
[ 3687.824538] [<ffffffff8020c58e>] child_rip+0x0/0x12
[ 3687.824540]

What could cause this? I use NFS4 to automount the home directories
from a Solaris10 server, and this box found a few bugs in the NFS4
code (fixed in the 2.6.22 kernel).

I'll try running with 2.6.23 again for a few days, to see if I get the
pdflush stuck. Any other ideas?

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163

Attachment: signature.asc
Description: Digital signature