Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable

From: Song Liu
Date: Tue Mar 18 2025 - 19:38:40 EST


On Tue, Mar 18, 2025 at 4:00 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> On Tue, Mar 18, 2025 at 01:14:40PM -0700, Song Liu wrote:
> > >
> > > See for example all the error conditions in the x86 version of
> > > arch_stack_walk_reliable() and in arch/x86/kernel/unwind_frame.c.
> >
> > I guess I missed this part:
> >
> > diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c
> > index 69d0567a0c38..3bb8e3ea4c4b 100644
> > --- i/arch/arm64/kernel/stacktrace.c
> > +++ w/arch/arm64/kernel/stacktrace.c
> > @@ -268,6 +268,8 @@ kunwind_next(struct kunwind_state *state)
> > case KUNWIND_SOURCE_TASK:
> > case KUNWIND_SOURCE_REGS_PC:
> > err = kunwind_next_frame_record(state);
> > + if (err && err != -ENOENT)
> > + state->common.unreliable = true;
> > break;
> > default:
> > err = -EINVAL;
>
> I still see some issues:
>
> - do_kunwind() -> kunwind_recover_return_address() can fail
>
> - do_kunwind() -> consume_state() can fail

Great points! I have got the following:

diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c
index 69d0567a0c38..852e4b322209 100644
--- i/arch/arm64/kernel/stacktrace.c
+++ w/arch/arm64/kernel/stacktrace.c
@@ -139,6 +139,7 @@ kunwind_recover_return_address(struct kunwind_state *state)
(void *)state->common.fp);
if (state->common.pc == orig_pc) {
WARN_ON_ONCE(state->task == current);
+ state->common.unreliable = true;
return -EINVAL;
}
state->common.pc = orig_pc;
@@ -268,6 +269,8 @@ kunwind_next(struct kunwind_state *state)
case KUNWIND_SOURCE_TASK:
case KUNWIND_SOURCE_REGS_PC:
err = kunwind_next_frame_record(state);
+ if (err && err != -ENOENT)
+ state->common.unreliable = true;
break;
default:
err = -EINVAL;
@@ -404,12 +407,16 @@ static __always_inline bool
arch_kunwind_reliable_consume_entry(const struct kunwind_state
*state, void *cookie)
{
struct kunwind_reliable_consume_entry_data *data = cookie;
+ bool ret;

if (state->common.unreliable) {
data->unreliable = true;
return false;
}
- return data->consume_entry(data->cookie, state->common.pc);
+ ret = data->consume_entry(data->cookie, state->common.pc);
+ if (!ret)
+ data->unreliable = true;
+ return ret;
}

noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn
consume_entry,


> - even in the -ENOENT case the unreliable bit has already been set
> right before the call to kunwind_next_frame_record_meta().

For this one, do you mean we set state->common.unreliable, but
failed to propagate it to data.unreliable?

Thanks,
Song