Re: multi-second application stall in open()

From: Rakesh Iyer
Date: Thu Jun 21 2012 - 17:48:33 EST


-- Resending because my mail went out in html text and got bounced by
the list, apologies if you see it twice --

Hello,

I coded up the watchdog and dropped it in but never did get the time
to go looking for evidence of stalls so no confirmed evidence of what
the cause was.

Chad and I did manage to stare at the code long and hard and sort of
convince ourselves that cfq_cfqq_wait_busy & associated logic could be
the cause of the stall (strictly in my opinion - that logic can be
fully be fully folded into the idling logic, but that's a discussion
for another day).

Hope that helps.
-Rakesh

On Thu, Jun 21, 2012 at 2:32 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Thu, Jun 21, 2012 at 04:28:24PM -0500, Josh Hunt wrote:
> > When you say the code has diverged from upstream, do you mean from 3.0
> > to 3.5?
>
> It's based on something diverged from 2.6.X, so an ancient thing.
>
> > Or maybe I'm misunderstanding what you're getting at. Also, if
> > you have any links to the watchdog timer code you're referring to I
> > would appreciate it.
>
> Rakesh is the one who observed the bug and wrote the watchdog code.
> Rakesh, I think Josh is seeing similar cfqq hang issue.  Did the
> watchdog code reveal why that happened?  Or was it mainly to just kick
> the queue and keep it going?
>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/