Re: [GIT PULL] arm64 updates for 6.2

From: Ard Biesheuvel
Date: Tue Dec 13 2022 - 07:36:30 EST


l

On Tue, 13 Dec 2022 at 13:11, Will Deacon <will@xxxxxxxxxx> wrote:
>
> Hi Linus,
>
> [+Ard]
>
> On Mon, Dec 12, 2022 at 10:05:07AM -0800, Linus Torvalds wrote:
> > On Fri, Dec 9, 2022 at 3:25 AM Will Deacon <will@xxxxxxxxxx> wrote:
> > >
> > > Dynamic SCS:
> > > * Support for dynamic shadow call stacks to allow switching at
> > > runtime between Clang's SCS implementation and the CPU's
> > > pointer authentication feature when it is supported (complete
> > > with scary DWARF parser!)
> >
> > I've pulled this thing, but this part makes me nervous. There's some
> > bad history with debug information not being 100% reliable probably
> > simply because it gets very little correctness testing.
>
> Hey, I did use the word "scary"! This is, at least, very easy to back
> out (it's effectively an optimisation) if the DWARF info ends up being
> too unreliable and causes issues in practice. We're also only looking
> at .eh_frame here, which should hopefully get a lot more correctness
> testing when compared to the .debug sections due to exception unwinding.
>

Indeed. And this is Clang 15+ at the moment, for precisely this reason.

> > It might be worth thinking about at least verifying the information
> > using something like objtool, so that you at least catch problem cases
> > at *build* time rather than runtime.
>
> Checking that the DWARF data looks sensible at build time isn't a bad
> idea, but see below as I think we can probably still produce a functional
> kernel Image in this case.
>
> > For example, that whole
> >
> > default:
> > pr_err("unhandled opcode: %02x in FDE frame %lx\n",
> > opcode[-1], (uintptr_t)frame);
> > return -ENOEXEC;
> >
> > really makes me go "this should have been verified at build time, it's
> > much too late to notice now that you don't understand the dwarf data".
>
> This isn't actually as bad as it looks -- the patching operation here
> only kicks in on CPUs which do not implement the pointer authentication
> instructions (i.e. where the CPU executes these as NOPs). Therefore, if
> patching bails out half way due to the "unhandled opcode" above, we
> should be ok, albeit missing some SCS coverage.

Indeed.

> I say "should" because
> if we fail within a frame after patching in the SCS "push" but before
> patching in the "pop", then we'd end up with a corrupt SCS pointer.
>
> Ard -- do you think we could tweak the patching so that we patch the push
> and the pop together (e.g. by tracking the two locations on a per-frame
> basis and postponing the text poking until just before we return from
> scs_handle_fde_frame())?
>

The push and the pop are not necessarily balanced (there may be more
than one pop for each push), and the opcode we look for
(DW_CFA_negate_ra_state) may occur in places which are not actually a
pop, so tracking these is not as straight-forward as this.

What we could do is track the push and the first pop on a first pass,
and if we don't encounter any unexpected opcodes, patch the push and
do a second pass starting from the first pop. Or just simply run it
twice and do no patching the first time around (the DWARF frames are
not very big)