Re: multi-second application stall in open()

From: Josh Hunt
Date: Tue Jun 26 2012 - 16:37:28 EST


On Tue, Jun 26, 2012 at 10:53 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Tue, Jun 26, 2012 at 10:18:04AM -0500, Josh Hunt wrote:
>> On Tue, Jun 26, 2012 at 7:59 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>> > On Mon, Jun 25, 2012 at 11:01:48PM -0500, Josh Hunt wrote:
>> >
>> > [..]
>> >> So this really seems like a problem with kblockd not kicking in. I've
>> >> instrumented every path in select_queue and it's not getting hit after
>> >> schedule dispatch. Everything seems to stall at that point until a new
>> >> request comes in.
>> >
>> > Ok, that's cool. So now we need to find out why queued work is not being
>> > scheduled.
>> >
>> > I think there are some workqueue related trace points. If you enable those
>> > along with blktraces, that should give tejun some data to look at.
>> >
>> > Thanks
>> > Vivek
>>
>> Tejun
>>
>> Do you have any suggestions on how to debug this?
>>
>> I did "perf record -a -e workqueue:*" and grabbed some tracepoint
>> data, but it's hard to correlate when these events are occurring in
>> the blktrace logs. Will keep investigating.
>
> If you capture blktrace logs through trace_pipe and not blktrace
> tool, you will get both workqueue and block traces with time stamps and
> then correlating these becomes easier. So just enable "blk" tracer, enable
> tracing on that particular device and then enable certaion workqueue
> related trace events and capture trace_pipe output.
>
> thanks
> Vivek

Thanks Vivek! Always nice to learn some new debugging tricks! :)

I don't know if these will paste in very well so I'm attaching some
snippets from the log I captured as well. Here is the bad case:

testd-7261 [000] 9527.900483: 8,0 m N cfq14995S
complete rqnoidle 0
testd-7261 [000] 9527.900485: 8,0 m N cfq14995S
arm slice timer
testd-7261 [000] 9527.900486: 8,0 m N cfq14995S
Idling. st->count:1, hw_tag:0
testd-7261 [000] 9527.900488: 8,0 m N cfq14995S
arm_idle: 8 group_idle: 0
testd-7261 [000] 9527.900489: 8,0 m N cfq schedule
dispatch: busy_queues:1 rq_queued:0 rq_in_driver:0 rq_in_flight[0]:0
[1]:0
testd-7261 [000] 9527.900494: 8,0 m N cfq
select_queue: !rq_queued
testd-7261 [000] 9527.907998: 8,0 m N cfq idle timer fired
testd-7261 [000] 9527.908003: 8,0 m N cfq14995S
idle_slice_timer: active queue
testd-7261 [000] 9527.908004: 8,0 m N cfq14995S
idle_slice_timer: clear_cfqq_deep
testd-7261 [000] 9527.908006: 8,0 m N cfq14995S
slice expired t=0
testd-7261 [000] 9527.908007: 8,0 m N cfq14995S del timer
testd-7261 [000] 9527.908008: 8,0 m N cfq14995S
update idle time stats
testd-7261 [000] 9527.908011: 8,0 m N cfq14995S
sl_used=9 disp=2 charge=9 iops=0 sect=16
testd-7261 [000] 9527.908012: 8,0 m N cfq14995S
slice_expired del cfqq from rr - on_rr:1 rb empty:1
testd-7261 [000] 9527.908014: 8,0 m N cfq14995S
del_from_rr queued[0]:0 queued[1]:0
testd-7261 [000] 9527.908015: 8,0 m N cfq14995S
slice_expired resort_rr_list
testd-7261 [000] 9527.908016: 8,0 m N cfq14995S
slice_expired setting active_queue to NULL
testd-7261 [000] 9527.908017: 8,0 m N cfq14995S
slice_expired setting active_cic to NULL
testd-7261 [000] 9527.908019: 8,0 m N cfq schedule
dispatch: busy_queues:1 rq_queued:1 rq_in_driver:0 rq_in_flight[0]:0
[1]:0
flush-8:0-6082 [001] 9528.075769: 8,0 A W 28304865 + 8
<- (8,3) 7340040
flush-8:0-6082 [001] 9528.075773: block_bio_remap: 8,0 W
28304865 + 8 <- (8,3) 7340040

The print "cfq schedule dispatch: busy_queues:1 rq_queued:1
rq_in_driver:0 rq_in_flight[0]:0 [1]:0" is inside the if to call
kblockd_schedule_work() so it is getting called, but you can see that
we never see the workqueue_queue_work tracepoint being hit.

Here's what it looks like in the good case:

<idle>-0 [000] 9523.042437: 8,0 m N cfq idle timer fired
<idle>-0 [000] 9523.042441: 8,0 m N cfq6082S
idle_slice_timer: active queue
<idle>-0 [000] 9523.042442: 8,0 m N cfq6082S
idle_slice_timer: clear_cfqq_deep
<idle>-0 [000] 9523.042443: 8,0 m N cfq6082S
slice expired t=0
<idle>-0 [000] 9523.042444: 8,0 m N cfq6082S del timer
<idle>-0 [000] 9523.042445: 8,0 m N cfq6082S
update idle time stats
<idle>-0 [000] 9523.042447: 8,0 m N cfq6082S
sl_used=8 disp=1 charge=8 iops=0 sect=8
<idle>-0 [000] 9523.042449: 8,0 m N cfq6082S
slice_expired del cfqq from rr - on_rr:1 rb empty:1
<idle>-0 [000] 9523.042450: 8,0 m N cfq6082S
del_from_rr queued[0]:0 queued[1]:0
<idle>-0 [000] 9523.042451: 8,0 m N cfq6082S
slice_expired resort_rr_list
<idle>-0 [000] 9523.042452: 8,0 m N cfq6082S
slice_expired setting active_queue to NULL
<idle>-0 [000] 9523.042453: 8,0 m N cfq6082S
slice_expired setting active_cic to NULL
<idle>-0 [000] 9523.042455: 8,0 m N cfq schedule
dispatch: busy_queues:1 rq_queued:1 rq_in_driver:0 rq_in_flight[0]:0
[1]:0
<idle>-0 [000] 9523.042456: workqueue_queue_work: work
struct=ffff880108db8a48 function=cfq_kick_queue
workqueue=ffff88010cf5e800 req_cpu=0 cpu=0
<idle>-0 [000] 9523.042456: workqueue_activate_work:
work struct ffff880108db8a48
kworker/0:1-12940 [000] 9523.042462: workqueue_execute_start:
work struct ffff880108db8a48: function cfq_kick_queue
kworker/0:1-12940 [000] 9523.042462: 8,0 m N cfq kick queue
kworker/0:1-12940 [000] 9523.042464: 8,0 m N cfq
select_queue: !cfqq
kworker/0:1-12940 [000] 9523.042464: 8,0 m N cfq
select_queue: cfq_choose_cfqg
kworker/0:1-12940 [000] 9523.042466: 8,0 m N cfq workload slice:40
kworker/0:1-12940 [000] 9523.042466: 8,0 m N cfq
select_queue: setting active queue to new_cfqq:NULL
kworker/0:1-12940 [000] 9523.042468: 8,0 m N cfq5979A
set_active wl_prio:0 wl_type:0

Tejun - if you could let me know what other data I should be looking
for/need to enable in the traces I would appreciate it.

Thanks!
--
Josh

Attachment: workqueue.trace
Description: Binary data