Re: multi-second application stall in open()

From: Josh Hunt
Date: Fri Jun 22 2012 - 16:53:03 EST


On Fri, Jun 22, 2012 at 3:42 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Fri, Jun 22, 2012 at 03:22:01PM -0500, Josh Hunt wrote:
>
> [..]
>> It looks like my statement about busy_queues isn't entirely accurate.
>> I'm seeing this later on in the trace:
>
> Yes it is being called some times. But it is interesting to see that in
> the instance you pasted in last mail, we did not see "schedule_dispatch"
> message.  Are you sure that other requests were queued at that point of
> time and there were busy_queues? You can try printing cfqd->rq_queued
> to verify that.
>
> Also put some message in cfq_select_queue() to see why it did not select
> a new queue before the stall.
>
> Thanks
> Vivek

I think you may be correct that at those points in time there weren't
any busy queues. I do know that there were requests waiting during
this time:

Insert:
8,0 1 296677 4847.796879805 6076 P N [flush-8:0]
8,0 1 296678 4847.796881202 6076 I W 32800441 + 8 (
2794) [flush-8:0]
8,0 1 0 4847.796883157 0 m N cfq5917 insert_request
8,0 1 296679 4847.796884834 6076 U N [flush-8:0] 1

Stall (one of many b/t insert and dispatch):
8,0 1 296851 4852.776942694 8216 U N [httpdmon] 1
8,0 1 0 4852.776945488 0 m N cfq select_queue: !cfqq
8,0 1 0 4852.776946885 0 m N cfq select_queue:
cfq_choose_cfqg
8,0 1 0 4852.776948281 0 m N cfq select_queue:
setting active queue to new_cfqq:NULL
8,0 1 0 4852.776950237 0 m N cfq8216 set_active
wl_prio:0 wl_type:1
8,0 1 0 4852.776951634 0 m N cfq8216 del timer
8,0 1 0 4852.776952751 0 m N cfq8216 update idle time stats
8,0 1 0 4852.776954427 0 m N cfq8216 Idling on
sync. st->count:1
8,0 1 0 4852.776957780 0 m N cfq8216 fifo= (null)
8,0 1 0 4852.776959456 0 m N cfq8216 dispatch_insert
8,0 1 0 4852.776961412 0 m N cfq8216 remove request
8,0 1 0 4852.776963647 0 m N cfq8216 dispatched a request
8,0 1 0 4852.776965043 0 m N cfq8216 activate rq, drv=1
8,0 1 296852 4852.776966161 8216 D R 13529455 + 8 (
34642) [httpdmon]
8,0 1 0 4852.777044943 0 m N cfq8216 Idling on
sync. st->count:1
8,0 1 0 4852.777046340 0 m N cfq8216 select_queue:
dispatched and should_idle - keep_queue
8,0 0 440281 4852.777130150 7284 C R 13529455 + 8 ( 163989) [0]
8,0 0 0 4852.777145236 0 m N cfq8216 complete rqnoidle 0
8,0 0 0 4852.777147471 0 m N cfq8216 set_slice=100
8,0 0 0 4852.777149427 0 m N cfq8216 arm slice timer
8,0 0 0 4852.777151103 0 m N cfq8216 Idling on
sync. st->count:1
8,0 0 0 4852.777153058 0 m N cfq8216 arm_idle: 8
group_idle: 0
8,0 0 0 4847.817116995 0 m N cfq schedule dispatch
8,0 0 0 4847.817120068 0 m N cfq7646 select_queue:
timer_pending - keep_queue
8,0 0 0 4847.824613293 0 m N cfq idle timer fired
8,0 0 0 4847.824616366 0 m N cfq7646
idle_slice_timer: active queue
8,0 0 0 4847.824617763 0 m N cfq7646
idle_slice_timer: clear_cfqq_deep
8,0 0 0 4847.824619439 0 m N cfq7646 slice expired t=0
8,0 0 0 4847.824621395 0 m N cfq7646 del timer
8,0 0 0 4847.824622792 0 m N cfq7646 update idle time stats
8,0 0 0 4847.824625865 0 m N cfq7646 sl_used=12
disp=3 charge=12 iops=0 sect=24
8,0 0 0 4847.824627262 0 m N cfq7646 slice_expired
del cfqq from rr
8,0 0 0 4847.824628379 0 m N cfq7646 del_from_rr
8,0 0 0 4847.824629776 0 m N cfq7646 slice_expired
resort_rr_list
8,0 0 0 4847.824631452 0 m N cfq7646 slice_expired
setting active_queue to NULL
8,0 0 0 4847.824632849 0 m N cfq7646 slice_expired
setting active_cic to NULL
8,0 0 0 4847.824634246 0 m N cfq schedule dispatch
8,0 1 296749 4848.309940967 7646 A R 33545872 + 8 <- (8,4) 2152
8,0 1 296750 4848.309956053 7646 Q R 33545872 + 8 [httpdmon]

Dispatch:
8,0 0 0 4857.832310760 0 m N cfq5917 activate rq, drv=2
8,0 0 440336 4857.832312157 7284 D W 32800441 + 8
(10035430955) [testd]
8,0 0 440337 4857.832447371 7284 C W 2076271 + 8 ( 687247) [0]
8,0 0 0 4857.832457987 0 m N cfq5917 complete rqnoidle 0

In this case you see the schedule dispatch message so there were busy
queues, but for whatever reason nothing gets run. Will keep digging.

--
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/