Re: [PATCH 00/19] x86/dumpstack: rewrite x86 stack dump code

From: Andy Lutomirski
Date: Fri Jul 22 2016 - 20:32:26 EST


On Fri, Jul 22, 2016 at 5:22 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> So without having yet looked at the code, I want people to understand
> that to a very real degree, the stack tracer that the *oopsing* code
> (ie what all the usual kernel fault handlers use) is very very special
> code and needs to be handled very carefully, and needs to be extra
> robust, even in the presence of stack corruption, and even in the
> presence of the dwarf info being totally corrupted. Because we've very
> much had both things happen.
>
> It is very possible that we should have two different stack tracers -
> the stupid "for oopses only" code that doesn't necessarily give the
> perfect trace, but is very anal and happily gives old stale addresses
> (which can be very useful for seeing what happened just before the
> "real" stack trace), and then a separate stack trace engine that is
> clever and gets things right, and if that one faults it can depend on
> the normal kernel fault handling picking up the pieces.

I think that Josh's code has the potential to be extremely robust
*and* give more correct results when possible. One thing I intend to
review when v2 shows up is that it's as conservative as it needs to be
to avoid ever dereferencing an out-of-bounds pointer. And Josh's oops
printer carefully walks and prints out all addresses on the stack
(complete with question marks) even if the unwinder doesn't find them.

>
> Yes, the current stack tracer is crufty. No, it's not perfect. But it
> is very well tested, and has held up. That should not be dismissed.
>

I think you may be giving the current tracer slightly more credit than
it's due. In my stack guard page patchset, I fixed two separate
issues, one of which caused recursive faults and one of which caused
it to output nothing at all. So maybe *now* it's very robust :) But
it's still an umaintainable mess IMO, and Josh's patchset helps a
*lot*.

--Andy