Re: Debugging Thinkpad T430s occasional suspend failure.
From: Paul E. McKenney
Date: Sun Feb 17 2013 - 14:39:56 EST
On Sat, Feb 16, 2013 at 11:46:59AM -0800, Linus Torvalds wrote:
> On Sat, Feb 16, 2013 at 11:25 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Sorry for the delay in testing this, but there was a need to upgrade
> > my laptop, and bozo here figured "why not go to 64 bits while I am at
> > it?" -- and then proceeded to learn the hard way that it is necessary
> > to do "make mrproper" before doing a build in 64-bit mode. :-/
>
> Hmm. Our object file dependency check includes checking that the
> compiler options are the same, but that's only true for normal C
> files. Some of the other rules do *not* test the full range of config
> options, so in general, if you change architecture etc models, you do
> indeed want to make sure that you do a "make distclean" (aka "make
> mrproper") or something like "git clean -dqfx".
>
> For a number of other files, we just depend on the normal make
> timestamp logic, which means that "if the object file is newer than
> the sources", we'll trust it. Which obviously doesn't work for cases
> where the object file may have been generated under totally different
> architecture rules..
>
> (That said, what kind of old environment did you do this in?
> stub32_sigaltstack was removed during the merge window, so I'm
> assuming you applied my patch on top of plain 3.7 or something?)
This was in a git tree 3.7-rc7. And stub32_sigaltstack is now gone,
but perhaps I did something stupid that made it persist.
Ah, the previous time I did a build directly out of this git tree
might well have been before 3.8-rc, so maybe the .o file was from before?
> > The kernel build system's way of telling you this at the moment is:
> >
> > arch/x86/built-in.o:(.rodata+0x4990): undefined reference to `stub32_sigaltstack'
>
> Adding Peter Anvin to the people, just in case he sees what's wrong
> with the system call stub generation that keeps excessively old object
> files around. If it's easy to fix, it might be worth trying to make it
> ok to switch from i386 to x86-64 and back in the same tree.
>
> Peter? Not a big deal, but if you see something obvious, let's just
> try to fix it, ok?
>
> > Anyway, with this patch, I see CPU stall warnings when running rcutorture
> > as shown below. This is not a hard failure:
>
> Yeah, there's something wrong with the patch, I didn't bother trying
> to figure it out for now. It also causes a hard failure with lockdep
> (or lock proving/debugging, I'm not sure which one triggered it) - and
> it happens too early to even see anything on the screen.
Glad that it is not just me, then. ;-)
> So I'd like to make that "downgrade from hardirq to softirq" atomic,
> and I think it would clean up the crazy code too (currently it does a
> *lot* of back-and-forth on the preempt flags), but I clearly missed
> some case where we used a wrapper or two to add some tracepoint or a
> RCU scheduling point. And I'm not going to worry about it right now,
> since I'm preparing to make v3.8 soon.
>
> But if somebody spots the bug, holler.
I must confess that your patch looked OK to me...
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/