Re: [BUG] 2.6.26-rc1-git4 - task blocked on powerpc for more than120 seconds

From: Nadia Derbey
Date: Mon May 19 2008 - 04:23:16 EST


Kamalesh Babulal wrote:
Nadia Derbey wrote:

Kamalesh Babulal wrote:

Adrian Bunk wrote:


On Wed, May 07, 2008 at 06:52:53PM +0530, Kamalesh Babulal wrote:


While running the ltp over the powerpc with the 2.6.26-rc1-git4 kernel,
task get blocked for more 120 seconds and does not proceeds future.

The task msgctl08 is a ipc testcase, which test the msgget(2) msgctl(2)
syscalls.

Thanks for your report.

I assume this is reproducible?

If yes, what was the last kernel that worked for you?

This is reproducible on 2.6.26-rc1 and 2.6.26-rc1-git(s). 2.6.25 is the last know kernel
to work.



I've also added Pierre Peiffer and Nadia Derbey to the Cc since their recent ipc commits are my first suspects.



Machine is stuck in the task, printing the following call trace
more than 5000 times. The oom-killer invoked once in-between.

INFO: task msgctl08:16248 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000762435a0] [0000000000000001] 0x1 (unreliable)
[c000000076243770] [c000000000010acc] .__switch_to+0x128/0x16c
[c000000076243800] [c0000000004b07a4] .schedule+0x7ac/0x890
[c0000000762438f0] [c0000000004b2ba4] .rwsem_down_failed_common+0x260/0x2b0
[c0000000762439b0] [c0000000004b2c60] .rwsem_down_read_failed+0x2c/0x44
[c000000076243a60] [c0000000004b1b84] .down_read+0x44/0x5c
[c000000076243af0] [c000000000295e10] .ipc_lock+0x34/0xe0
[c000000076243b90] [c00000000029690c] .ipc_lock_check+0x24/0x78
[c000000076243c20] [c0000000002972c0] .do_msgsnd+0xb0/0x3f8
[c000000076243d10] [c000000000293ce8] .compat_sys_msgsnd+0x94/0xc0
[c000000076243db0] [c000000000014418] .compat_sys_ipc+0x130/0x1f4
[c000000076243e30] [c000000000008734] syscall_exit+0x0/0x40
msgctl08 invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0
Call Trace:
[c00000001e1c7640] [c0000000000101bc] .show_stack+0x70/0x1bc (unreliable)
[c00000001e1c76f0] [c0000000000c2c78] .oom_kill_process+0x78/0x230
[c00000001e1c77a0] [c0000000000c3390] .out_of_memory+0x28c/0x320
[c00000001e1c7870] [c0000000000c70ac] .__alloc_pages_internal+0x364/0x468
[c00000001e1c7980] [c0000000000e83cc] .alloc_page_vma+0x120/0x14c
[c00000001e1c7a20] [c0000000000e0224] .read_swap_cache_async+0x7c/0x160
[c00000001e1c7ae0] [c0000000000e035c] .swapin_readahead+0x54/0xd4
[c00000001e1c7ba0] [c0000000000d47e8] .handle_mm_fault+0x520/0x9d8
[c00000001e1c7c80] [c0000000004b5054] .do_page_fault+0x440/0x624
[c00000001e1c7e30] [c0000000000053fc] handle_page_fault+0x20/0x5c
Mem-info:
Node 0 DMA per-cpu:
CPU 0: hi: 6, btch: 1 usd: 1
CPU 1: hi: 6, btch: 1 usd: 1
Node 1 DMA per-cpu:
CPU 0: hi: 6, btch: 1 usd: 5
CPU 1: hi: 6, btch: 1 usd: 5
Active:31 inactive:2628 dirty:1 writeback:6 unstable:0
free:156 slab:13071 mapped:548 pagetables:41109 bounce:0
Node 0 DMA free:5312kB min:5760kB low:7168kB high:8640kB active:1024kB inactive:768kB present:3928832kB pages_scanned:3912 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Node 1 DMA free:4672kB min:4992kB low:6208kB high:7488kB active:960kB inactive:169792kB present:3404992kB pages_scanned:140140 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Node 0 DMA: 5*64kB 15*128kB 2*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB 0*8192kB 0*16384kB = 5312kB
Node 1 DMA: 8*64kB 5*128kB 6*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB 0*8192kB 0*16384kB = 4736kB
8185 total pagecache pages
Swap cache: add 29545, delete 21888, find 5602/10373
Free swap = 803712kB
Total swap = 2048128kB
114688 pages of RAM
766 reserved pages
231218 pages shared
7657 pages swap cached
Out of memory: kill process 15371 (msgctl08) score 39223 or a child
Killed process 15373 (msgctl08)
INFO: task msgctl08:15385 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000e9a43290] [0000000000000001] 0x1 (unreliable)
[c0000000e9a43460] [c000000000010acc] .__switch_to+0x128/0x16c
[c0000000e9a434f0] [c0000000004b07a4] .schedule+0x7ac/0x890
[c0000000e9a435e0] [c0000000004b0d90] .io_schedule+0x7c/0xe8
[c0000000e9a43670] [c0000000002d7f44] .get_request_wait+0x15c/0x1e0
[c0000000e9a43750] [c0000000002d8950] .__make_request+0x3ec/0x4d8
[c0000000e9a43800] [c0000000002d6624] .generic_make_request+0x35c/0x3b0
[c0000000e9a438e0] [c0000000002d81b0] .submit_bio+0x118/0x140
[c0000000e9a439a0] [c0000000000dfa88] .swap_readpage+0x94/0xb4
[c0000000e9a43a20] [c0000000000e02b8] .read_swap_cache_async+0x110/0x160
[c0000000e9a43ae0] [c0000000000e035c] .swapin_readahead+0x54/0xd4
[c0000000e9a43ba0] [c0000000000d47e8] .handle_mm_fault+0x520/0x9d8
[c0000000e9a43c80] [c0000000004b5054] .do_page_fault+0x440/0x624
[c0000000e9a43e30] [c0000000000053fc] handle_page_fault+0x20/0x5c
INFO: task msgctl08:15405 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000b1757290] [0000000000000001] 0x1 (unreliable)
[c0000000b1757460] [c000000000010acc] .__switch_to+0x128/0x16c
[c0000000b17574f0] [c0000000004b07a4] .schedule+0x7ac/0x890
[c0000000b17575e0] [c0000000004b0d90] .io_schedule+0x7c/0xe8
[c0000000b1757670] [c0000000002d7f44] .get_request_wait+0x15c/0x1e0
[c0000000b1757750] [c0000000002d8950] .__make_request+0x3ec/0x4d8
[c0000000b1757800] [c0000000002d6624] .generic_make_request+0x35c/0x3b0
[c0000000b17578e0] [c0000000002d81b0] .submit_bio+0x118/0x140
[c0000000b17579a0] [c0000000000dfa88] .swap_readpage+0x94/0xb4
[c0000000b1757a20] [c0000000000e02b8] .read_swap_cache_async+0x110/0x160
[c0000000b1757ae0] [c0000000000e035c] .swapin_readahead+0x54/0xd4
[c0000000b1757ba0] [c0000000000d47e8] .handle_mm_fault+0x520/0x9d8
[c0000000b1757c80] [c0000000004b5054] .do_page_fault+0x440/0x624
[c0000000b1757e30] [c0000000000053fc] handle_page_fault+0x20/0x5c
INFO: task msgctl08:15406 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000eec53290] [0000000000000001] 0x1 (unreliable)
[c0000000eec53460] [c000000000010acc] .__switch_to+0x128/0x16c
[c0000000eec534f0] [c0000000004b07a4] .schedule+0x7ac/0x890
[c0000000eec535e0] [c0000000004b0d90] .io_schedule+0x7c/0xe8
[c0000000eec53670] [c0000000002d7f44] .get_request_wait+0x15c/0x1e0
[c0000000eec53750] [c0000000002d8950] .__make_request+0x3ec/0x4d8
[c0000000eec53800] [c0000000002d6624] .generic_make_request+0x35c/0x3b0
[c0000000eec538e0] [c0000000002d81b0] .submit_bio+0x118/0x140
[c0000000eec539a0] [c0000000000dfa88] .swap_readpage+0x94/0xb4
[c0000000eec53a20] [c0000000000e02b8] .read_swap_cache_async+0x110/0x160
[c0000000eec53ae0] [c0000000000e035c] .swapin_readahead+0x54/0xd4
[c0000000eec53ba0] [c0000000000d47e8] .handle_mm_fault+0x520/0x9d8
[c0000000eec53c80] [c0000000004b5054] .do_page_fault+0x440/0x624
[c0000000eec53e30] [c0000000000053fc] handle_page_fault+0x20/0x5c

Kamalesh,

I'm wondering whether this is not related to a problem found in LTP after msgmni had been changed from 16 to a computed value.

Does your ltp test suite contain the patch I had sent in thread http://lkml.org/lkml/2008/2/21/123?

If you want to have a look at it, discussion started in thread http://lkml.org/lkml/2008/2/16/6

Regards,
Nadia


Hi Nadia,
Thanks, I am using the LTP-20080430 and tried it with the 2.6.26-rc2-git4 kernel.
but the call trace remains the same.


Ok, I'll try to find a ppc machine: could not reproduce it on my Xeon machine (but actually it was with a 2.6.26-rc1 kernel).

Regards,
Nadia

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/