Re: [PATCH 00/19] x86/dumpstack: rewrite x86 stack dump code

From: Linus Torvalds
Date: Fri Jul 22 2016 - 20:22:31 EST


On Fri, Jul 22, 2016 at 6:21 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> Some if its advantages:
>
> - simplicity: no more callback sprawl and less code duplication.
>
> - flexibility: allows the caller to stop and inspect the stack state at
> each step in the unwinding process.
>
> - modularity: the unwinder code, console stack dump code, and stack
> metadata analysis code are all better separated so that changing one
> of them shouldn't have much of an impact on any of the others.

I've been without internet for the last week, so I have a ton pending,
and not good enough internet even now to take a good look.

However, I want to make one thing really really clear: the absolute
NUMBER ONE requirement for the stack tracing code is none of the
above.

The #1 requirement is that it works, and not have a chance in hell of
ever breaking. We had that happen once before when people wanted to
make it fancy and add Dwarf info, and it was such a f*cking disaster
that I am not sure I ever want to do that again. Seriously.

It does not matter if the stack tracing gives the wrong answers.

It does not matter if the stack tracing is complicated and odd old code.

It does not matter one whit if some new user is inconvenienced, and in
fact it is possible that new users should write their *own* stack
tracer code.

The ONLY thing that matters (to a very high degree) is that the code
is stable, and if an Oops happens, the stack tracer never *ever*
cause even more problems than we already have.

If the stack tracer *ever* takes a recursive fault and kills the
machine, the stack tracer is worse than bad - we'd be better off
*without* a stack tracer at all.

And yes, we had exactly that situation, where bugs in the stack tracer
meant that other bugs ended up being much harder to debug, because
instead of a nice logged oops message that would have been trivial to
figure out, we very occasionally ended up with a dead machine instead.

So without having yet looked at the code, I want people to understand
that to a very real degree, the stack tracer that the *oopsing* code
(ie what all the usual kernel fault handlers use) is very very special
code and needs to be handled very carefully, and needs to be extra
robust, even in the presence of stack corruption, and even in the
presence of the dwarf info being totally corrupted. Because we've very
much had both things happen.

It is very possible that we should have two different stack tracers -
the stupid "for oopses only" code that doesn't necessarily give the
perfect trace, but is very anal and happily gives old stale addresses
(which can be very useful for seeing what happened just before the
"real" stack trace), and then a separate stack trace engine that is
clever and gets things right, and if that one faults it can depend on
the normal kernel fault handling picking up the pieces.

Yes, the current stack tracer is crufty. No, it's not perfect. But it
is very well tested, and has held up. That should not be dismissed.

Linus