Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable

From: Madhavan T. Venkataraman
Date: Tue Mar 23 2021 - 08:47:03 EST




On 3/23/21 5:42 AM, Mark Rutland wrote:
> On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@xxxxxxxxxxxxxxxxxxx wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx>
>>
>> EL1 exceptions can happen on any instruction including instructions in
>> the frame pointer prolog or epilog. Depending on where exactly they happen,
>> they could render the stack trace unreliable.
>>
>> If an EL1 exception frame is found on the stack, mark the stack trace as
>> unreliable.
>>
>> Now, the EL1 exception frame is not at any well-known offset on the stack.
>> It can be anywhere on the stack. In order to properly detect an EL1
>> exception frame the following checks must be done:
>>
>> - The frame type must be EL1_FRAME.
>>
>> - When the register state is saved in the EL1 pt_regs, the frame
>> pointer x29 is saved in pt_regs->regs[29] and the return PC
>> is saved in pt_regs->pc. These must match with the current
>> frame.
>
> Before you can do this, you need to reliably identify that you have a
> pt_regs on the stack, but this patch uses a heuristic, which is not
> reliable.
>
> However, instead you can identify whether you're trying to unwind
> through one of the EL1 entry functions, which tells you the same thing
> without even having to look at the pt_regs.
>
> We can do that based on the entry functions all being in .entry.text,
> which we could further sub-divide to split the EL0 and EL1 entry
> functions.
>

Yes. I will check the entry functions. But I still think that we should
not rely on just one check. The additional checks will make it robust.
So, I suggest that the return address be checked first. If that passes,
then we can be reasonably sure that there are pt_regs. Then, check
the other things in pt_regs.

>>
>> Interrupts encountered in kernel code are also EL1 exceptions. At the end
>> of an interrupt, the interrupt handler checks if the current task must be
>> preempted for any reason. If so, it calls the preemption code which takes
>> the task off the CPU. A stack trace taken on the task after the preemption
>> will show the EL1 frame and will be considered unreliable. This is correct
>> behavior as preemption can happen practically at any point in code
>> including the frame pointer prolog and epilog.
>>
>> Breakpoints encountered in kernel code are also EL1 exceptions. The probing
>> infrastructure uses breakpoints for executing probe code. While in the probe
>> code, the stack trace will show an EL1 frame and will be considered
>> unreliable. This is also correct behavior.
>>
>> Signed-off-by: Madhavan T. Venkataraman <madvenka@xxxxxxxxxxxxxxxxxxx>
>> ---
>> arch/arm64/include/asm/stacktrace.h | 2 +
>> arch/arm64/kernel/stacktrace.c | 57 +++++++++++++++++++++++++++++
>> 2 files changed, 59 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index eb29b1fe8255..684f65808394 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -59,6 +59,7 @@ struct stackframe {
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> int graph;
>> #endif
>> + bool reliable;
>> };
>>
>> extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>> @@ -169,6 +170,7 @@ static inline void start_backtrace(struct stackframe *frame,
>> bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
>> frame->prev_fp = 0;
>> frame->prev_type = STACK_TYPE_UNKNOWN;
>> + frame->reliable = true;
>> }
>>
>> #endif /* __ASM_STACKTRACE_H */
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 504cd161339d..6ae103326f7b 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -18,6 +18,58 @@
>> #include <asm/stack_pointer.h>
>> #include <asm/stacktrace.h>
>>
>> +static void check_if_reliable(unsigned long fp, struct stackframe *frame,
>> + struct stack_info *info)
>> +{
>> + struct pt_regs *regs;
>> + unsigned long regs_start, regs_end;
>> +
>> + /*
>> + * If the stack trace has already been marked unreliable, just
>> + * return.
>> + */
>> + if (!frame->reliable)
>> + return;
>> +
>> + /*
>> + * Assume that this is an intermediate marker frame inside a pt_regs
>> + * structure created on the stack and get the pt_regs pointer. Other
>> + * checks will be done below to make sure that this is a marker
>> + * frame.
>> + */
>
> Sorry, but NAK to this approach specifically. This isn't reliable (since
> it can be influenced by arbitrary data on the stack), and it's far more
> complicated than identifying the entry functions specifically.
>

As I mentioned above, I agree that we should check the return address. But
just as a precaution, I think we should double check the pt_regs.

Is that OK with you? It does not take away anything or increase the risk in
anyway. I think it makes it more robust.

Thanks.

Madhavan