Re: [RFC PATCH v3 1/4] arm64: Introduce stack trace reliability checks in the unwinder

From: Josh Poimboeuf
Date: Tue May 04 2021 - 20:08:28 EST


On Tue, May 04, 2021 at 06:13:39PM -0500, Madhavan T. Venkataraman wrote:
>
>
> On 5/4/21 4:52 PM, Josh Poimboeuf wrote:
> > On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@xxxxxxxxxxxxxxxxxxx wrote:
> >> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> >> unsigned long fp = frame->fp;
> >> struct stack_info info;
> >>
> >> + frame->reliable = true;
> >> +
> >
> > Why set 'reliable' to true on every invocation of unwind_frame()?
> > Shouldn't it be remembered across frames?
> >
>
> This is mainly for debug purposes in case a caller wants to print the whole stack and also
> print which functions are unreliable. For livepatch, it does not make any difference. It will
> quit as soon as it encounters an unreliable frame.

Hm, ok. So 'frame->reliable' refers to the current frame, not the
entire stack.

> > Also, it looks like there are several error scenarios where it returns
> > -EINVAL but doesn't set 'reliable' to false.
> >
>
> I wanted to make a distinction between an error situation (like stack corruption where unwinding
> has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a
> stack trace is taken for informational purposes or debug purposes, the unwinding will try to
> proceed until either the stack trace ends or an error happens.

Ok, but I don't understand how that relates to my comment.

Why wouldn't a stack corruption like !on_accessible_stack() set
'frame->reliable' to false?

In other words: for livepatch purposes, how does the caller tell the
difference between hitting the final stack record -- which returns an
error with reliable 'true' -- and a stack corruption like
!on_accessible_stack(), which also returns an error with reliable
'true'? Surely the latter should be considered unreliable?

--
Josh