Re: [RFC PATCH v4 1/2] arm64: Introduce stack trace reliability checks in the unwinder

From: Madhavan T. Venkataraman
Date: Fri May 21 2021 - 15:42:00 EST




On 5/21/21 2:16 PM, Josh Poimboeuf wrote:
> On Fri, May 21, 2021 at 02:11:45PM -0500, Josh Poimboeuf wrote:
>> On Fri, May 21, 2021 at 01:59:16PM -0500, Madhavan T. Venkataraman wrote:
>>>
>>>
>>> On 5/21/21 1:48 PM, Josh Poimboeuf wrote:
>>>> On Fri, May 21, 2021 at 06:53:18PM +0100, Mark Brown wrote:
>>>>> On Fri, May 21, 2021 at 12:47:13PM -0500, Madhavan T. Venkataraman wrote:
>>>>>> On 5/21/21 12:42 PM, Mark Brown wrote:
>>>>>
>>>>>>> Like I say we may come up with some use for the flag in error cases in
>>>>>>> future so I'm not opposed to keeping the accounting there.
>>>>>
>>>>>> So, should I leave it the way it is now? Or should I not set reliable = false
>>>>>> for errors? Which one do you prefer?
>>>>>
>>>>>> Josh,
>>>>>
>>>>>> Are you OK with not flagging reliable = false for errors in unwind_frame()?
>>>>>
>>>>> I think it's fine to leave it as it is.
>>>>
>>>> Either way works for me, but if you remove those 'reliable = false'
>>>> statements for stack corruption then, IIRC, the caller would still have
>>>> some confusion between the end of stack error (-ENOENT) and the other
>>>> errors (-EINVAL).
>>>>
>>>
>>> I will leave it the way it is. That is, I will do reliable = false on errors
>>> like you suggested.
>>>
>>>> So the caller would have to know that -ENOENT really means success.
>>>> Which, to me, seems kind of flaky.
>>>>
>>>
>>> Actually, that is why -ENOENT was introduced - to indicate successful
>>> stack trace termination. A return value of 0 is for continuing with
>>> the stack trace. A non-zero value is for terminating the stack trace.
>>>
>>> So, either we return a positive value (say 1) to indicate successful
>>> termination. Or, we return -ENOENT to say no more stack frames left.
>>> I guess -ENOENT was chosen.
>>
>> I see. So it's a tri-state return value, and frame->reliable is
>> intended to be a private interface not checked by the callers.
>
> Or is frame->reliable supposed to be checked after all? Looking at the
> code again, I'm not sure.
>
> Either way it would be good to document the interface more clearly in a
> comment above the function.
>

So, arch_stack_walk_reliable() would do this:

start_backtrace(frame);

while (...) {
if (!frame->reliable)
return error;

consume_entry(...);

ret = unwind_frame(...);

if (ret)
break;
}

if (ret == -ENOENT)
return success;
return error;


Something like that.

I will add a comment about all of this in the unwinder.

Thanks!

Madhavan