Re: [RFC] Circumventing FineIBT Via Entrypoints

From: Andrew Cooper
Date: Thu Feb 13 2025 - 16:24:29 EST

Next message: kan . liang: "[RESEND PATCH] perf/x86/msr: Make SMI and PPERF on by default"
Previous message: Bjorn Helgaas: "Re: [PATCH v2] PCI: Use downstream bridges for distributing resources"
In reply to: Jann Horn: "Re: [RFC] Circumventing FineIBT Via Entrypoints"
Next in thread: Jennifer Miller: "Re: [RFC] Circumventing FineIBT Via Entrypoints"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 13/02/2025 7:23 pm, Jann Horn wrote:
> On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@xxxxxxx> wrote:
>> Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint,
>> I think more or less the same could be done for the other entrypoints.
>>
>> ```
>> endbr64
>> test rsp, rsp
>> js slowpath
>>
>> swapgs
>> ~~fastpath continues~~
>>
>> ; path taken when rsp was a kernel address
>> ; we have no choice really but to switch to the stack from the untrusted
>> ; gsbase but after doing so we have to be careful about what we put on the
>> ; stack
>> slowpath:
>> swapgs

I'm afraid I don't follow. By this point, both basic blocks are the
same (a single swapgs).

Malicious userspace can get onto the slowpath by loading a kernel
pointer into %rsp. Furthermore, if the origin of this really was in the
kernel, then ...

>>
>> ; swap stacks as normal
>> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20>
>> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24>

... these are memory accesses using the user %gs. As you note a few
lines lower, %gs isn't safe at this point.

A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping,
at point we'll have loaded an attacker controlled %rsp, then take #PF
trying to spill %rsp into pcpu_hot, and now we're running the pagefault
handler on an attacker controlled stack and gsbase.

>> ~~normal push and clear GPRs sequence here~~
>>
>> ; we entered with an rsp in the kernel address range.
>> ; we already did swapgs but we don't know if we can trust our gsbase yet.
>> ; we should be able to trust the ro_after_init __per_cpu_offset array
>> ; though.
>>
>> ; check that gsbase is the expected value for our current cpu
>> rdtscp
>> mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset>
>>
>> rdgsbase rbx
>>
>> cmp rbx, rax
>> je fastpath_after_regs_preserved
>>
>> wrgsbase rax

Irrespective of other things, you'll need some compatibility strategy
for the fact that RDTSCP and {RD,WR}{FS,GS}BASE cannot be used
unconditionally in 64bit mode. It might be as simple as making FineIBT
depend on their presence to activate, but taking a #UD exception in this
path is also a priv-esc vulnerability.

While all CET-IBT capable CPUs ought to have RDTSCP/*BASE, there are
virt environments where this implication does not hold.

>>
>> ; if we reach here we are being exploited and should explode or attempt
>> ; to recover
>> ```
>>
>> The unfortunate part is that it would still result in the register state
>> being dumped on top of some attacker controlled address, so if the error
>> path is recoverable someone could still use entrypoints to convert control
>> flow hijacking into memory corruption via register dump. So it would kill
>> the ability to get ROP but it would still be possible to dump regs over
>> modprobe_path, core_pattern, etc.
> It is annoying that we (as far as I know) don't have a nice clear
> security model for what exactly CFI in the kernel is supposed to
> achieve - though I guess that's partly because in its current version,
> it only happens to protect against cases where an attacker gets a
> function pointer overwrite, but not the probably more common cases
> where the attacker (also?) gets an object pointer overwrite...
>
>> Does this seem feasible and any better than the alternative of overwriting
>> and restoring KERNEL_GS_BASE?
> The syscall entry point is a hot path; my main reason for suggesting
> the RSP check is that I'm worried about the performance impact of the
> gsbase-overwriting approach, but I don't actually have numbers on
> that. I figure a test + conditional jump is about the cheapest we can
> do...

Yeah, this is the cheapest I can think of too. TEST+JS has been able to
macrofuse since the Core2 era.

> Do we know how many cycles wrgsbase takes, and how serializing
> is it? Sadly Agner Fog's tables don't seem to list it...

Not (architecturally) serialising, and pretty quick IIRC. It is
microcoded, but the segment registers are renamed so it can execute
speculatively.

~Andrew

>
> How would we actually do that overwriting and restoring of
> KERNEL_GS_BASE? Would we need a scratch register for that?

Next message: kan . liang: "[RESEND PATCH] perf/x86/msr: Make SMI and PPERF on by default"
Previous message: Bjorn Helgaas: "Re: [PATCH v2] PCI: Use downstream bridges for distributing resources"
In reply to: Jann Horn: "Re: [RFC] Circumventing FineIBT Via Entrypoints"
Next in thread: Jennifer Miller: "Re: [RFC] Circumventing FineIBT Via Entrypoints"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]