Re: [PATCH v3 00/36] arm64/gcs: Provide support for GCS in userspace

From: Mark Brown
Date: Tue Aug 08 2023 - 17:14:41 EST


On Tue, Aug 08, 2023 at 02:38:58PM +0100, Will Deacon wrote:

> But seriously, I think the question is more about what this brings us
> *on top of* SCS, since for the forseeable future folks that care about
> this stuff (like Android) will be using SCS. GCS on its own doesn't make
> sense to me, given the recompilation effort to remove SCS and the lack
> of hardware, so then you have to look at what it brings in addition to
> GCS and balance that against the performance cost.

> Given that, is anybody planning to ship a distribution with this enabled?

I'm not sure that your assumption that the only people would would
consider deploying this are those who have deployed SCS is a valid one,
SCS users are definitely part of the mix but GCS is expected to be much
more broadly applicable. As you say SCS is very invasive, requires a
rebuild of everything with different code generated and as Szabolcs
outlined has ABI challenges for general distros. Any code built (or
JITed) with anything other than clang is going to require some explicit
support to do SCS (eg, the kernel's SCS support does nothing for
assembly code) and there's a bunch of runtime support. It's very much a
specialist feature, mainly practical in well controlled somewhat
vertical systems - I've not seen any suggestion that general purpose
distros are considering using it.

In contrast in the case of GCS one of the nice features is that for most
code it's very much non-invasive, much less so than things like PAC/BTI
and SCS, which means that the audience is much wider than it is for SCS
- it's a *much* easier sell for general purpose distros to enable GCS
than to enable SCS. For the majority of programs all the support that
is needed is in the kernel and libgcc/libc, there's no impact on the
code generation. There are no extra instructions in the normal flow
which will impact systems without the feature, and there are no extra
registers in use, so even if the binaries are run on a system without
GCS or for some reason someone decides that it's best to turn the
feature off on a system that is capable of using it the fact that it's
just using the existing bl/ret pairs means that there is minimal
overhead. This all means that it's much more practical to deploy in
general purpose distros. On the other hand when active it affects all
code, this improves coverage but the improved coverage can be a worry.

I can see that systems that have gone through all the effort of enabling
SCS might not rush to implement GCS, though there should be no harm in
having the two features running side by side beyond the doubled memory
requirements so you can at least have a transition plan (GCS does have
some allowances which enable hardware to mitigate some of the memory
bandwidth requirements at least). You do still get the benefit of the
additional hardware protections GCS offers, and the coverage of all
branch and ret instructions will be of interest both for security and
for unwinders. It's definitely offers less of an incremental
improvement on top of SCS than it is without SCS though.

GCS and SCS are comparable features in terms of the protection they aim
to add but their system integration impacts are different.

> If not, why are we bothering? If so, how much of that distribution has
> been brought up and how does the "dynamic linker or other startup code"
> decide what to do?

There is active interest in the x86 shadow stack support from distros,
GCS is a lot earlier on in the process but isn't fundamentally different
so it is expected that this will translate. There is also a chicken and
egg thing where upstream support gates a lot of people's interest, what
people will consider carrying out of tree is different to what they'll
enable. Architecture specific feedback on the implementation can also
be fed back into the still ongoing review of the ABI that is being
established for x86, there will doubtless be pushback about variations
between architectures from userspace people.

The userspace decision about enablement will primarily be driven by an
ELF marking which the dynamic linker looks at to determine if the
binaries it is loading can support GCS, a later dlopen() can either
refuse to load an additional library if the process currently has GCS
enabled, ignore the issue and hope things work out (there's a good
chance they will but obviously that's not safe) or (more complicatedly)
go round all the threads and disable GCS before proceeding. The main
reason any sort of rebuild is required for most code is to add the ELF
marking, there will be a compiler option to select it. Static binaries
should know if everything linked into them is GCS compatible and enable
GCS if appropriate in their startup code.

The majority of the full distro work at this point is on the x86 side
given the hardware availability, we are looking at that within Arm of
course. I'm not aware of any huge blockers we have encountered thus
far.

It is fair to say that there's less active interest on the arm64 side
since as you say the feature is quite a way off making it's way into
hardware, though there are also long lead times on getting the full
software stack to end users and kernel support becomes a blocker for
the userspace stack.

> After the mess we had with BTI and mprotect(), I'm hesitant to merge
> features like this without knowing that the ABI can stand real code.

The equivalent x86 feature is in current hardware[1], there has been
some distro work (I believe one of the issues x86 has had is coping with
a distro which shipped an early out of tree ABI, that experience has
informed the current ABI which as the cover letter says we are following
closely). AIUI the biggest blocker on userspace work for x86 right now
is landing the kernel side of things so that everyone else has a stable
ABI to work from and don't need to carry out of tree patches, I've heard
frustration expressed at the deployment being held up. IIRC Fedora were
on the leading edge in terms of active interest, they tend to be given
that they're one of the most quickly iterating distros.

This definitely does rely fairly heavily on the x86 experience for
confidence in the ABI, and to be honest one of the big unknowns at this
point is if you or Catalin will have opinions on how things are being
done.

[1] https://edc.intel.com/content/www/us/en/design/ipla/software-development-platforms/client/platforms/alder-lake-desktop/12th-generation-intel-core-processors-datasheet-volume-1-of-2/009/shadow-stack/

Attachment: signature.asc
Description: PGP signature