Re: [PATCH 3/4] x86: Rewrite switch_to() code

From: Andy Lutomirski
Date: Mon May 23 2016 - 13:04:20 EST


On Mon, May 23, 2016 at 9:46 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Mon, May 23, 2016 at 06:49:03AM -0500, Josh Poimboeuf wrote:
>> On Mon, May 23, 2016 at 06:47:22AM -0500, Josh Poimboeuf wrote:
>> > On Mon, May 23, 2016 at 07:14:14AM -0400, Brian Gerst wrote:
>> > > On Sun, May 22, 2016 at 10:34 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> > > > On Sun, May 22, 2016 at 10:59:38AM -0700, Andy Lutomirski wrote:
>> > > >> cc: Josh Poimboeuf: do you care about the exact stack layout of the
>> > > >> bottom of the stack of an inactive task?
>> > > >
>> > > > So there's one minor issue with this patch, relating to unwinding the
>> > > > stack of a newly forked task. For detecting reliable stacks, the
>> > > > unwinder needs to unwind all the way to the syscall pt_regs to make sure
>> > > > the stack is sane. But for newly forked tasks, that won't be possible
>> > > > here because the unwinding will stop at the fork_frame instead.
>> > > >
>> > > > So from an unwinder standpoint it might be nice for copy_thread_tls() to
>> > > > place a frame pointer on the stack next to the ret_from_fork return
>> > > > address, so that it would resemble an actual stack frame. The frame
>> > > > pointer could probably just be hard-coded to zero. And then the first
>> > > > bp in fork_frame would need to be a pointer to it instead of zero. That
>> > > > would make it nicely resemble the stack of any other task.
>> > > >
>> > > > Alternatively I could teach the unwinder that if the unwinding starts at
>> > > > the fork_frame offset from the end of the stack page, and the saved rbp
>> > > > is zero, it can assume that it's a newly forked task. But that seems a
>> > > > little more brittle to me, as it requires the unwinder to understand
>> > > > more of the internal workings of the fork code.
>> > > >
>> > > > But overall I think this patch is a really nice cleanup, and other than
>> > > > the above minor issue it should be fine with my reliable unwinder, since
>> > > > rbp is still at the top of the stack.
>> > >
>> > > Ok, how about if it pushed RBP first, then we teach get_wchan() to add
>> > > the fixed offset from thread.sp to get bp? that way it don't have to
>> > > push it twice.
>> >
>> > In theory I like the idea, and it would work: the unwinder could just
>> > use the inactive_task_frame struct (as Andy suggested) to find the frame
>> > pointer.
>> >
>> > But I suspect it would break all existing unwinders, both in-tree and
>> > out-of-tree. The only out-of-tree one I know of is crash, not sure if
>> > there are more out there.
>>
>> I should mention it would only affect those unwinders which know how to
>> do sleeping kernel tasks. So generic tools like gdb wouldn't be
>> affected.
>
> [continuing my conversation with myself...]
>
> To clarify, I still think we should do it. The stack format of a
> sleeping task isn't exactly an ABI, and I wouldn't expect many tools to
> rely on it. I can help with the fixing of in-tree unwinders if needed.
>
> Or I could even do the moving of the frame pointer as a separate patch
> on top of this one, since it might cause breakage elsewhere.

Do you have any understanding of why there are so many unwinder
implementations? Your reliable unwinder seems to be yet another copy
of more or less the same code.

I'd like to see a single, high-quality unwinder implemented as a state
machine, along the lines of:

struct unwind_state state;
unwind_start_inactive_task(&state, ...); or
unwind_start_pt_regs(&state, regs); or whatever.
unwind_next_frame(&state);

where, after unwind_next_frame, state encodes whatever registers are
known (at least bp and ip, but all the GPRs would be nice and are
probably mandatory for DWARF) and an indication of whether this is a
real frame or a guessed frame (the things that currently show up as
'?').

--Andy