Re: [PATCH v5 11/17] arm64: docs: document pointer authentication

From: Ramana Radhakrishnan
Date: Wed Oct 24 2018 - 06:56:54 EST

On 19/10/2018 12:35, Catalin Marinas wrote:
On Tue, Oct 16, 2018 at 05:14:39PM +0100, Kristina Martsenko wrote:
On 05/10/2018 10:04, Ramana Radhakrishnan wrote:
On 05/10/2018 09:47, Kristina Martsenko wrote:
+Pointer authentication is not currently supported in KVM guests. KVM
+will mask the feature bits from ID_AA64ISAR1_EL1, and attempted use of
+the feature will result in an UNDEFINED exception being injected into
+the guest.

However applications using instructions from the hint space will
continue to work albeit without any protection (as they would just be
nops) ?

Mostly, yes. If the guest leaves SCTLR_EL1.EnIA unset (and
EnIB/EnDA/EnDB), then PAC* and AUT* instructions in the HINT space will
execute as NOPs. If the guest sets EnIA, then PAC*/AUT* instructions
will trap and KVM will inject an "Unknown reason" exception into the
guest (which will cause a Linux guest to send a SIGILL to the application).

I think that part is fine. If KVM (a fairly recent version with CPUID
sanitisation) does not enable ptr auth, the CPUID should not advertise
this feature either so the guest kernel should not enable it. For the
above instructions in the HINT space, they will just be NOPs. If the
guest kernel enables the feature regardless of the CPUID information, it
deserves to get an "Unknown reason" exception.

In the latter case we could instead pretend the instruction was a NOP
and not inject an exception, but trapping twice per every function would
probably be terrible for performance. The guest shouldn't be setting
EnIA anyway if ID_AA64ISAR1_EL1 reports that pointer authentication is
not present (because KVM has hidden it).

I don't think we should. The SCTLR_EL1 bits are RES0 unless you know
that the feature is present via CPUID.

The other special case is the XPACLRI instruction, which is also in the
HINT space. Currently it will trap and KVM will inject an exception into
the guest. We should probably change this to NOP instead, as that's what
applications will expect. Unfortunately there is no EnIA-like control to
make it NOP.

Very good catch. Basically if EL2 doesn't know about ptr auth (older
distro), EL1 may or may not know but leaves SCTLR_EL1 disabled (based on
CPUID), the default HCR_EL2 is to trap (I'm ignoring EL3 as that's like
to have ptr auth enabled, being built for the specific HW). So a user
app considering XPACLRI a NOP (or inoffensive) will get a SIGILL
(injected by the guest kernel following the injection of "Unknown
reason" exception by KVM).

Ramana, is XPACLRI commonly generated by gcc and expects it to be a NOP?
Could we restrict it to only being used at run-time if the corresponding
HWCAP is set? This means redefining this instruction as no longer in the
NOP space.

Sorry to have missed this - I'm still catching up on email.

XPACLRI is used in the unwinder in exactly 2 places but not for unwinding itself but for storing the actual return address in the data structures, its not something I expect to be used very commonly so a check there seems reasonable. The xpaclri is considered a nop in the architecture as it is defined today. I don't like the idea of redefining instructions as not in the nop space after it's been defined as being so. We could investigate guarding the XPACLRI with a check with the HWCAP. How many unwinders would you like us to fix ?

One option is for KVM to pretend the instruction was a NOP and return to
the guest. But if XPACLRI gets executed frequently, then the constant
trapping might hurt performance. I don't know how frequently it might
get used, as I don't know of any applications currently using it. From
what I understand, it may be used by userspace stack unwinders.

Yep. Probably one instruction per frame being unwound during exception unwinding. And no trapping will be expensive even though it's *only* in the exception unwind case.

(Also worth noting - as far as I can tell there is no easy way for KVM
to know which pointer authentication instruction caused the trap, so we
may have to do something unusual like use "at s12e1r" to read guest
memory and check for XPACLRI.)

Indeed, it's not an easy fix. As discussed (in the office), we can't
even guarantee that the guest stage 1 translation is stable and points
to the actual XPACLRI instruction.

The other option is to turn off trapping entirely. However then on a
big.LITTLE system with mismatched pointer authentication support
instructions will work intermittently on some CPUs but not others.

That's another case but let's assume we never see such configurations ;).

That's a broken system by design :) !