Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable

From: Madhavan T. Venkataraman
Date: Tue Mar 23 2021 - 09:40:02 EST




On 3/23/21 8:36 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 07:56:40AM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 3/23/21 5:51 AM, Mark Rutland wrote:
>>> On Mon, Mar 15, 2021 at 11:57:57AM -0500, madvenka@xxxxxxxxxxxxxxxxxxx wrote:
>>>> From: "Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx>
>>>>
>>>> When CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled and tracing is activated
>>>> for a function, the ftrace infrastructure is called for the function at
>>>> the very beginning. Ftrace creates two frames:
>>>>
>>>> - One for the traced function
>>>>
>>>> - One for the caller of the traced function
>>>>
>>>> That gives a reliable stack trace while executing in the ftrace
>>>> infrastructure code. When ftrace returns to the traced function, the frames
>>>> are popped and everything is back to normal.
>>>>
>>>> However, in cases like live patch, execution is redirected to a different
>>>> function when ftrace returns. A stack trace taken while still in the ftrace
>>>> infrastructure code will not show the target function. The target function
>>>> is the real function that we want to track.
>>>>
>>>> So, if an FTRACE frame is detected on the stack, just mark the stack trace
>>>> as unreliable.
>>>
>>> To identify this case, please identify the ftrace trampolines instead,
>>> e.g. ftrace_regs_caller, return_to_handler.
>>>
>>
>> Yes. As part of the return address checking, I will check this. IIUC, I think that
>> I need to check for the inner labels that are defined at the point where the
>> instructions are patched for ftrace. E.g., ftrace_call and ftrace_graph_call.
>>
>> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>> bl ftrace_stub <====================================
>>
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL) // ftrace_graph_caller();
>> nop <======= // If enabled, this will be replaced
>> // "b ftrace_graph_caller"
>> #endif
>>
>> For instance, the stack trace I got while tracing do_mmap() with the stack trace
>> tracer looks like this:
>>
>> ...
>> [ 338.911793] trace_function+0xc4/0x160
>> [ 338.911801] function_stack_trace_call+0xac/0x130
>> [ 338.911807] ftrace_graph_call+0x0/0x4
>> [ 338.911813] do_mmap+0x8/0x598
>> [ 338.911820] vm_mmap_pgoff+0xf4/0x188
>> [ 338.911826] ksys_mmap_pgoff+0x1d8/0x220
>> [ 338.911832] __arm64_sys_mmap+0x38/0x50
>> [ 338.911839] el0_svc_common.constprop.0+0x70/0x1a8
>> [ 338.911846] do_el0_svc+0x2c/0x98
>> [ 338.911851] el0_svc+0x2c/0x70
>> [ 338.911859] el0_sync_handler+0xb0/0xb8
>> [ 338.911864] el0_sync+0x180/0x1c0
>>
>>> It'd be good to check *exactly* when we need to reject, since IIUC when
>>> we have a graph stack entry the unwind will be correct from livepatch's
>>> PoV.
>>>
>>
>> The current unwinder already handles this like this:
>>
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> if (tsk->ret_stack &&
>> (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
>> struct ftrace_ret_stack *ret_stack;
>> /*
>> * This is a case where function graph tracer has
>> * modified a return address (LR) in a stack frame
>> * to hook a function return.
>> * So replace it to an original value.
>> */
>> ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
>> if (WARN_ON_ONCE(!ret_stack))
>> return -EINVAL;
>> frame->pc = ret_stack->ret;
>> }
>> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>
> Beware that this handles the case where a function will return to
> return_to_handler, but doesn't handle unwinding from *within*
> return_to_handler, which we can't do reliably today, so that might need
> special handling.
>

OK. I will take a look at this.

>> Is there anything else that needs handling here?
>
> I wrote up a few cases to consider in:
>
> https://www.kernel.org/doc/html/latest/livepatch/reliable-stacktrace.html
>
> ... e.g. the "Obscuring of return addresses" case.
>
> It might be that we're fine so long as we refuse to unwind across
> exception boundaries, but it needs some thought. We probably need to go
> over each of the trampolines instruction-by-instruction to consider
> that.
>
> As mentioned above, within return_to_handler when we call
> ftrace_return_to_handler, there's a period where the real return address
> has been removed from the ftrace return stack, but hasn't yet been
> placed in x30, and wouldn't show up in a trace (e.g. if we could somehow
> hook the return from ftrace_return_to_handler).
>
> We might be saved by the fact we'll mark traces across exception
> boundaries as unreliable, but I haven't thought very hard about it. We
> might want to explciitly reject unwinds within return_to_handler in case
> it's possible to interpose ftrace_return_to_handler somehow.
>

OK. I will study the above.

Thanks.

Madhavan