Re: multi-second application stall in open()

From: Vivek Goyal
Date: Tue Jun 26 2012 - 11:53:31 EST


On Tue, Jun 26, 2012 at 10:18:04AM -0500, Josh Hunt wrote:
> On Tue, Jun 26, 2012 at 7:59 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > On Mon, Jun 25, 2012 at 11:01:48PM -0500, Josh Hunt wrote:
> >
> > [..]
> >> So this really seems like a problem with kblockd not kicking in. I've
> >> instrumented every path in select_queue and it's not getting hit after
> >> schedule dispatch. Everything seems to stall at that point until a new
> >> request comes in.
> >
> > Ok, that's cool. So now we need to find out why queued work is not being
> > scheduled.
> >
> > I think there are some workqueue related trace points. If you enable those
> > along with blktraces, that should give tejun some data to look at.
> >
> > Thanks
> > Vivek
>
> Tejun
>
> Do you have any suggestions on how to debug this?
>
> I did "perf record -a -e workqueue:*" and grabbed some tracepoint
> data, but it's hard to correlate when these events are occurring in
> the blktrace logs. Will keep investigating.

If you capture blktrace logs through trace_pipe and not blktrace
tool, you will get both workqueue and block traces with time stamps and
then correlating these becomes easier. So just enable "blk" tracer, enable
tracing on that particular device and then enable certaion workqueue
related trace events and capture trace_pipe output.

thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/