Re: [RFC PATCH 12/17] gcc-plugins: objtool: Add plugin to detect switch table on arm64

From: Julien Thierry
Date: Wed Feb 03 2021 - 03:12:57 EST




On 2/3/21 12:01 AM, Nick Desaulniers wrote:
On Tue, Feb 2, 2021 at 12:57 AM Julien Thierry <jthierry@xxxxxxxxxx> wrote:



On 2/2/21 12:17 AM, Nick Desaulniers wrote:
On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:

On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote:
On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:

On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
From: Raphael Gault <raphael.gault@xxxxxxx>

This plugins comes into play before the final 2 RTL passes of GCC and
detects switch-tables that are to be outputed in the ELF and writes
information in an ".discard.switch_table_info" section which will be
used by objtool.

Signed-off-by: Raphael Gault <raphael.gault@xxxxxxx>
[J.T.: Change section name to store switch table information,
Make plugin Kconfig be selected rather than opt-in by user,
Add a relocation in the switch_table_info that points to
the jump operation itself]
Signed-off-by: Julien Thierry <jthierry@xxxxxxxxxx>

Rather than tightly couple this feature to a particular toolchain via
plugin, it might be nice to consider what features could be spec'ed out
for toolchains to implement (perhaps via a -f flag).

The problem is being able to detect switch statement jump table vectors.

For a given indirect branch (due to a switch statement), what are all
the corresponding jump targets?

We would need the compiler to annotate that information somehow.

Makes sense, the compiler should have this information. How is this
problem solved on x86?

Thus far we've been able to successfully reverse engineer it on x86,
though it hasn't been easy.

There were some particulars for arm64 which made doing so impossible.
(I don't remember the details.)

The main issue is that the tables for arm64 have more indirection than x86.

I wonder if PAC or BTI also make this slightly more complex? PAC at
least has implications for unwinders, IIUC.


On x86, the dispatching jump instruction fetches the target address from
a contiguous array of addresses based on a given offset. So the list of
potential targets of the jump is neatly organized in a table (and sure,
before link time these are just relocation, but still processable).

On arm64 (with GCC at least), what is stored in a table is an array of
candidate offsets from the jump instruction. And because arm64 is
limited to 32bit instructions, the encoding often requires multiple
instructions to compute the target address:

ldr<*> x_offset, [x_offsets_table, x_index, ...] // load offset
adr x_dest_base, <addr> // load target branch for offset 0
add x_dest, x_target_base, x_offset, ... // compute final address
br x_dest // jump

Where this gets trickier is that (with GCC) the offsets stored in the
table might or might not be signed constants (and this can be seen in
GCC intermediate representations, but I do not believe this information
is output in the final object file). And on top of that, GCC might
decide to use offsets that are seen as unsigned during intermediate
representation as signed offset by sign extending them in the add
instruction.

So, to handle this we'd have to track the different operation done with
the offset, from the load to the final jump, decoding the instructions
and deducing the potential target instructions from the table of offsets.

But that is error prone as we don't really know how many instructions
can be between the ones doing the address computation, and I remember
some messy case of a jump table inside a jump table where tracking the
instruction touching one or the other offset would need a lot of corner
case handling.

And this of course is just for GCC, I haven't looked at what it all
looks like on Clang's end.

Sure, but this is what production unwinders do, and they don't require
compiler plugins, right? I don't doubt unwinders can be made simpler
with changes to toolchain output; please work with your compiler
vendor on making such changes rather than relying on compiler plugins
to do so.


I think there is a small confusion. The plugin nor the data it generates is not to be used by a kernel unwinder. It's here to allow objtool to assess whether the code being checked can be unwound (?) reliably (not omitting functions). Part of this is checking that a branch/jump in a function does not end up in some code that is not related to the function without setting up a call frame.

This is about static validation rather than functionality.

I think the details are pertinent to finding a portable solution. The
commit message of this commit in particular doesn't document such
details, such as why such an approach is necessary or how the data is
laid out for objtool to consume it.


Sorry, I will need to make that clearer. The next patch explains it a
bit [1]

Basically, for simplicity, the plugin creates a new section containing

Right, this takes a focus on simplicity, at the cost of alienating a toolchain.

Ard's point about 3193c0836f20 relating to -fgcse is that when
presented with tricky cases to unwind, the simplest approach is taken.
There it was disabling a compiler specific compiler optimization, here
it's either a compiler specific compiler plugin (or disabling another
compiler optimization). The pattern seems to be "Objtool isn't smart
enough" ... "compiler optimization disabled" or "compiler plugin
dependency."

tables (one per jump table) of references to the jump targets, similar
to what x86 has, except that in this case this table isn't actually used
by runtime code and is discarded at link time. I only chose this to
minimize what needed to be changed in objtool and because the format
seemed simple enough.

But I'm open on some alternative, whether it's a -fjump-table-info

Yes, I think we could spec out something like that. But I would
appreciate revisiting open questions around stack validation (frame
pointers), preventing the generation of jump tables to begin with
(-fno-jump-tables) in place of making objtool more robust, or
generally the need to depend on compiler plugins.


I'll give it a try at least for the arm64 side.

Thanks,

--
Julien Thierry