I need tips on how to debug a deadlock involving swap

From: Richard Yao
Date: Mon May 07 2012 - 11:54:22 EST


I have a deadlock that occurs when I swap to a virtual block device. The
driver is out-of-tree and it processes IO requests in worker threads.
Setting PF_MEMALLOC will prevent the deadlock, but it has the side
effect of grabbing pages from ZONE_DMA, which is bad.

I believe that direct reclaim is being triggered when swap occurs,
causing swap operations holding locks to depend on swap operations that
require those locks, but I am having trouble identifying how that happens.

The deadlock occurs in the IO worker threads, but the hung task timeout
provides a backtrace for the thread that triggered the IO request, which
is not helpful:

[ 218.252066] INFO: task python2.7:7027 blocked for more than 15 seconds.
[ 218.252070] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 218.252073] python2.7 D ffffffff814051e0 0 7027 7022
0x00000000
[ 218.252079] ffff8801b4c73798 0000000000000086 ffff8801b4c73758
ffff8801b4c73758
[ 218.252085] ffff880224b84a40 ffff8801b4c73fd8 ffff8801b4c73fd8
ffff8801b4c73fd8
[ 218.252091] ffff8802268aca40 ffff880224b84a40 ffff8801b4c73768
ffff88022fc91738
[ 218.252097] Call Trace:
[ 218.252105] [<ffffffff810b8300>] ? __lock_page+0x70/0x70
[ 218.252111] [<ffffffff8133241a>] schedule+0x3a/0x50
[ 218.252114] [<ffffffff813324ba>] io_schedule+0x8a/0xd0
[ 218.252118] [<ffffffff810b8309>] sleep_on_page+0x9/0x10
[ 218.252121] [<ffffffff81330547>] __wait_on_bit+0x57/0x80
[ 218.252131] [<ffffffff810c097e>] ? account_page_writeback+0xe/0x10
[ 218.252134] [<ffffffff810b84e0>] wait_on_page_bit+0x70/0x80
[ 218.252137] [<ffffffff81052da0>] ? autoremove_wake_function+0x40/0x40
[ 218.252141] [<ffffffff810c7245>] shrink_page_list+0x465/0x8f0
[ 218.252144] [<ffffffff810c7cf9>] shrink_inactive_list+0x379/0x470
[ 218.252147] [<ffffffff81336c2d>] ? sub_preempt_count+0x9d/0xd0
[ 218.252150] [<ffffffff810c8261>] shrink_mem_cgroup_zone+0x471/0x570
[ 218.252153] [<ffffffff810c8e0b>] do_try_to_free_pages+0xfb/0x420
[ 218.252156] [<ffffffff810c9251>] try_to_free_pages+0x71/0x80
[ 218.252159] [<ffffffff810c04f9>] __alloc_pages_nodemask+0x469/0x7a0
[ 218.252162] [<ffffffff810c3750>] ? __put_single_page+0x30/0x30
[ 218.252166] [<ffffffff810fa36c>] do_huge_pmd_anonymous_page+0x14c/0x350
[ 218.252170] [<ffffffff810d69cf>] handle_mm_fault+0x13f/0x2f0
[ 218.252172] [<ffffffff8133662e>] do_page_fault+0x14e/0x590
[ 218.252176] [<ffffffff81061739>] ? set_next_entity+0x39/0x80
[ 218.252179] [<ffffffff81062a8b>] ? pick_next_task_fair+0x6b/0x150
[ 218.252181] [<ffffffff8105dcf1>] ? get_parent_ip+0x11/0x50
[ 218.252184] [<ffffffff81336c2d>] ? sub_preempt_count+0x9d/0xd0
[ 218.252186] [<ffffffff81331fb8>] ? __schedule+0x2f8/0x6c0
[ 218.252189] [<ffffffff81333a75>] page_fault+0x25/0x30

Is there any way that I can ask the kernel to print stack traces of the
worker threads on demand?

Attachment: signature.asc
Description: OpenPGP digital signature