Re: [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode
From: David Laight
Date: Fri Feb 21 2025 - 08:41:03 EST
On Wed, 19 Feb 2025 17:31:39 +0000
Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> > --- a/arch/x86/include/asm/cfi.h
> > +++ b/arch/x86/include/asm/cfi.h
> > @@ -1116,6 +1129,52 @@ extern u8 fineibt_caller_end[];
> >
> > #define fineibt_caller_jmp (fineibt_caller_size - 2)
> >
> > +/*
> > + * Since FineIBT does hash validation on the callee side it is prone to
> > + * circumvention attacks where a 'naked' ENDBR instruction exists that
> > + * is not part of the fineibt_preamble sequence.
> > + *
> > + * Notably the x86 entry points must be ENDBR and equally cannot be
> > + * fineibt_preamble.
> > + *
> > + * The fineibt_paranoid caller sequence adds additional caller side
> > + * hash validation. This stops such circumvetion attacks dead, but at the cost
> > + * of adding a load.
> > + *
> > + * <fineibt_paranoid_start>:
> > + * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
> > + * 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d
> > + * a: 4d 8d 5b <f0> lea -0x10(%r11), %r11
I think that 0x10 is the size of the cfi premable?
There should probably be at least a comment to that effect.
(Maybe there is, but I'm missing the actual patch email.)
> > + * e: 75 fd jne d <fineibt_paranoid_start+0xd>
> > + * 10: 41 ff d3 call *%r11
> > + * 13: 90 nop
> > + *
> > + * Notably LEA does not modify flags and can be reordered with the CMP,
> > + * avoiding a dependency.
Is that even worth saying?
Given that the cpu does 'register renaming' the lea might execute in the
same clock as the mov.
What you do get is a few clocks of stall (maybe 4 if in L1 cache, but
a data read of code memory is unlikely to be there - so it'll be from
the L2 cache) for the memory load.
That means that the jne is speculatively executed (and I think that is
separate from any prefetch speculation), I'll give it 50% taken.
(Or maybe 100% if backwards branches get predicted taken. I don't think
current Intel cpu do that - they just use whatever in in the branch
prediction slot.)
> > + * Again, using a non-taken (backwards) branch
> > + * for the failure case, abusing LEA's immediate 0xf0 as LOCK prefix for the
> > + * JCC.d8, causing #UD.
> > + */
>
> I don't know what to say. This is equal parts horrifying and beautiful.
Agreed.
Are you absolutely sure that all cpu have (and will) always #UD the unexpected
LOCK prefix on a Jcc instruction.
My 80386 book does say it will #UD, but I can imagine it being ignored
or even repurposed.
David