Processes stuck in unkillable D state (now seen in 2.6.7-mm6)
From: Rob Mueller
Date: Thu Jul 08 2004 - 17:16:17 EST
This is an update to a thread I started last week about processes getting
stuck in D state.
About 2 days ago, we upgraded to 2.6.7-mm6. Things have generally been
running fine, but today again, some processes got stuck in an unkillable D
state. This time, rather than 1 process getting stuck however, about 20 got
stuck in a relatively short period of time (seems to have been over about
half an hour). All of processes are cyrus imapd processes.
I've tried to get sysreq-t output, but as this machine is still up and
running, it has about 2500 processes on it, and I can't seem to get
consistent sysreq-t output. I set the kernel log buffer size to 17 (128k)
but that definitely doesn't seem to be enough. I notice that it also seems
to dump to /var/log/messages, and I get more output there, but it still
doesn't seem to be a complete process list, and each time I do a sysreq-t, I
get a different number of procs (though always incomplete) in the output.
Anyway, I've done sysreq-t twice, and got the output from dmesg -s 1000000
and /var/log/messages. Since the output is so big, I've put them, and the
kernel config here:
http://robm.fastmail.fm/kernel/t1/
Process ID's that are definitely stuck are:
1013, 13389, 13469, 16056, 17340, 18489, 21341, 22661, 23976, 29138, 29752,
30330, 31106, 31956, 32559, 32575, 3753, 5926, 6052, 8857, 9914
But as mentioned above, you won't find most of these in the sysreq-t output,
I presume because the buffer isn't big enough. Still, hopefully the ones you
can see there will be some useful information. (FYI, searching for imapd\s+D
in the sysreq-t output rather than the individual pids seems to be a quicker
way of finding the problem procs)
Having a quick look myself, there are some odd things there though. For
instance, from sysreqmsglog1.txt
imapd D F1778660 0 3753 1906 3754 809 (NOTLB)
eb15adb8 00000086 00000020 f1778660 c0310318 c43fc600 08155888 0000002d
f567d380 f7b97480 c42c3d20 00000000 0001ece6 6051d45f 00007c67
c42c3d20
c03d8180 f1778660 f1778810 f78ad9cc 00000003 f78ad9cc f78ad9cc
c025d40c
Call Trace:
[<c0310318>] memcpy_fromiovec+0x38/0x60
[<c025d40c>] generic_unplug_device+0x2c/0x40
[<c037a288>] io_schedule+0x28/0x40
[<c012e17c>] __lock_page+0xbc/0xe0
[<c012deb0>] page_wake_function+0x0/0x50
[<c012deb0>] page_wake_function+0x0/0x50
[<c012f1a1>] filemap_nopage+0x231/0x360
[<c013dd58>] do_no_page+0xb8/0x3a0
[<c013bbbb>] pte_alloc_map+0xdb/0xf0
[<c013e1ee>] handle_mm_fault+0xbe/0x1a0
[<c0112c62>] do_page_fault+0x172/0x5ec
[<c012435b>] do_sigaction+0x19b/0x210
[<c0120dac>] update_process_times+0x2c/0x40
[<c0110230>] smp_apic_timer_interrupt+0x140/0x150
[<c0112af0>] do_page_fault+0x0/0x5ec
[<c0104b19>] error_code+0x2d/0x38
imapd D E59812C0 0 22661 1906 23248 22592 (NOTLB)
d54f5db8 00000086 f7b7de18 e59812c0 d54f5d94 c04b0dc0 00000020 00000000
c42c3060 f71696f0 c42c3d20 00000000 0002cda6 891b682d 00007b15
c42c3d20
f71696f0 e59812c0 e5981470 00000003 c025d3bb f78ad9cc f78ad9cc
c025d40c
Call Trace:
[<c025d3bb>] __generic_unplug_device+0x1b/0x40
[<c025d40c>] generic_unplug_device+0x2c/0x40
[<c037a288>] io_schedule+0x28/0x40
[<c012e17c>] __lock_page+0xbc/0xe0
[<c012deb0>] page_wake_function+0x0/0x50
[<c012deb0>] page_wake_function+0x0/0x50
[<c012f1a1>] filemap_nopage+0x231/0x360
[<c013dd58>] do_no_page+0xb8/0x3a0
[<c013bbbb>] pte_alloc_map+0xdb/0xf0
[<c013e1ee>] handle_mm_fault+0xbe/0x1a0
[<c0112af0>] do_page_fault+0x0/0x5ec
[<c0104a5a>] apic_timer_interrupt+0x1a/0x20
[<c0112c62>] do_page_fault+0x172/0x5ec
[<c012435b>] do_sigaction+0x19b/0x210
[<c0124693>] sys_rt_sigaction+0x53/0x90
[<c030c631>] sys_socketcall+0x111/0x200
[<c0112af0>] do_page_fault+0x0/0x5ec
[<c0104b19>] error_code+0x2d/0x38
Those calls into "generic_unplug_device" look really strange to me...
Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/