Re: [RFC PATCH v2 2/8] arm64: Implement frame types

From: Mark Brown
Date: Fri Mar 19 2021 - 09:23:08 EST

On Thu, Mar 18, 2021 at 05:22:49PM -0500, Madhavan T. Venkataraman wrote:
> On 3/18/21 12:40 PM, Mark Brown wrote:

> > Unless I'm misreading what's going on here this is more trying to set a
> > type for the stack as a whole than for a specific stack frame. I'm also
> > finding this a bit confusing as the unwinder already tracks things it
> > calls frame types and it handles types that aren't covered here like
> > SDEI. At the very least there's a naming issue here.

> Both these frames are on the task stack. So, it is not a stack type.

OTOH it's also not something that applies to every frame but only to the
base frame from each stack which I think was more where I was coming
from there. In any case, the issue is also that there's already another
thing that the unwinder calls a frame type so there's at least that
collision which needs to be resolved if nothing else.

> > Taking a step back though do we want to be tracking this via pt_regs?
> > It's reliant on us robustly finding the correct pt_regs and on having
> > the things that make the stack unreliable explicitly go in and set the
> > appropriate type. That seems like it will be error prone, I'd been
> > expecting to do something more like using sections to filter code for
> > unreliable features based on the addresses of the functions we find on
> > the stack or similar. This could still go wrong of course but there's
> > fewer moving pieces, and especially fewer moving pieces specific to
> > reliable stack trace.

> In that case, I suggest doing both. That is, check the type as well
> as specific functions. For instance, in the EL1 pt_regs, in addition
> to the above checks, check the PC against el1_sync(), el1_irq() and
> el1_error(). I have suggested this in the cover letter.

> If this is OK with you, we could do that. We want to make really sure that
> nothing goes wrong with detecting the exception frame.


> If you dislike the frame type, I could remove it and just do the
> following checks:

> FP == pt_regs->regs[29]
> PC == pt_regs->pc
> and the address check against el1_*() functions

> and similar changes for EL0 as well.

> I still think that the frame type check makes it more robust.

Yeah, we know the entry points so they can serve the same role as
checking an explicitly written value. It does mean one less operation
on exception entry, though I'm not sure that's that a big enough
overhead to actually worry about. I don't have *super* strong opinons
against adding the explicitly written value other than it being one more
thing we don't otherwise use which we have to get right for reliable
stack trace, there's a greater risk of bitrot if it's not something that
we ever look at outside of the reliable stack trace code.

> >> EL1_FRAME
> >> EL1 exception frame.

> > We do trap into EL2 as well, the patch will track EL2 frames as EL1
> > frames. Even if we can treat them the same the naming ought to be
> > clear.

> Are you referring to ARMv8.1 VHE extension where the kernel can run
> at EL2? Could you elaborate? I thought that EL2 was basically for
> Hypervisors.

KVM is the main case, yes - IIRC otherwise it's mainly error handlers
but I might be missing something. We do recommend that the kernel is
started at EL2 where possible.

Actually now I look again it's just not adding anything on EL2 entries
at all, they use a separate set of macros which aren't updated - this
will only update things for EL0 and EL1 entries so my comment above
about this tracking EL2 as EL1 isn't accurate.

Attachment: signature.asc
Description: PGP signature