Re: Debugging Thinkpad T430s occasional suspend failure.

From: Linus Torvalds
Date: Thu Feb 14 2013 - 21:10:13 EST


On Thu, Feb 14, 2013 at 5:15 PM, Dave Jones <davej@xxxxxxxxxx> wrote:
>
> Given I never saw this on a Fedora kernel, just my self-built ones, I eventually
> gave up on bisecting code, and switched to bisecting config options.
> I should have started this way, as I figured it out within an hour.
>
> 3.7 merge window is when I started seeing this, and here's what got introduced
> during that time..
>
> commit e3ebfb96f396731ca2d0b108785d5da31b53ab00
> Author: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> Date: Mon Jul 2 14:42:01 2012 -0700
>
> rcu: Add PROVE_RCU_DELAY to provoke difficult races
>
> 'difficult' is an understatement. This explains why some of those 'good'
> bisects survived 100 suspends on one day, and failed the next.
>
> Unfortunatly, I don't think there's any sane way to retrieve whatever debug
> info might be getting spewed.

Hmm. I have to say, that's a particularly unhelpful config option. It
may make races much easier to hit, but when you do hit them, what's
the symptoms of said race?

Paul? Apparently you end up with a dead machine at least during resume
and no oops. Which isn't very helpful. Maybe there is possibly some
BUG_ON() in the RCU code somewhere?

So Paul, if you know what the common symptoms of the bug that that
debug option helps trigger are, is there some way to make them less
lethal and still print out useful information?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/