Re: [PATCH] sched: update_rq_clock() must skip ONE update

From: Mike Galbraith
Date: Mon Mar 31 2014 - 14:27:48 EST


On Mon, 2014-03-31 at 09:13 -0700, Linus Torvalds wrote:
> On Sun, Mar 30, 2014 at 9:20 PM, Mike Galbraith
> <umgwanakikbuti@xxxxxxxxx> wrote:
> >
> > Point of being verbose was to make sure it was clear exactly how this
> > harmless little bug selectively kills large IO boxen..
>
> My point is that if you want it to be applied hours before I make a
> release, I need to be made aware of how critical it is.

Oh, I didn't cc you because I wanted it applied instantly as ultra
critical, only because the chain of events might be of interest.

It takes a lot of cycles to add up to NMI. Those cycles exist with or
without the throttle being fooled into picking on watchdog. How bad can
wakeup latency get with modprobe mptsas? So bad that you don't even
need this little bug to _further_ incapacitate the watchdog? Can the
wakeup latency do the job all by itself? It's wakeup latency that is
being improperly attributed to watchdog in the trace data.

(then there's "is watchdog being subject to throttle a good idea")

> The data/commentary in the commit message made *zero* sense to me in
> that regards. It was just noise.

One of my sisters says I speak Martian, she may be right. Looks clear
to me, but then I did the tracing, condensed the output and hastily
wrote the apparently useless words.. perhaps a tad too hastily.

I haven't yet received confirmation that this is the fix, so there may
be more to it, this only a part. A huge interrupt hit at the right time
and no irq accounting enabled could properly trigger the throttle.. but
it'd be difficult to reliably hit such thin targets on multiple CPUs.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/