Re: [PATCH] x86/kcfi: Optimize call sequence

From: David Laight

Date: Wed Jun 17 2026 - 08:36:52 EST


On Wed, 17 Jun 2026 13:12:07 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Wed, Jun 17, 2026 at 10:26:43AM +0100, David Laight wrote:
>
> > > > I think it would also be better it the code doing the patching checked
> > > > what it was overwriting.
> > >
> > > Ye of little faith :-)
> >
> > I wouldn't want to have to debug the consequences of getting it wrong.
> > (The same goes for patching into function preamble.)
>
> Been there, done that etc. :-) I'm the weirdo that's written all this
> code.

And I'm one of the weirdos who knows enough asm (various) to understand
what it is all doing :-)

...
> The thing is, objtool validates the retpolines are preceded by UD2 as
> marker for kCFI and complains when this is not so (there must not be
> unannotated indirect calls). And the code that is patching is already
> checking there is that mov into %r10d at the expected offset.
>
> The update poke happens when both those are true; (leading mov and
> trailing UD2), verifying things again has very little added value.

Ok, there is a check a bit earlier but not in this bit.
I'm just wary that just believing an address in some table could easily
lead to problems.

...
> > > Also, objtool
> > > typically avoids actually modifying code and generally prefers to just
> > > ship additional sections such that the kernel can modify itself. There
> > > is an exception to this, but there was definite grumbling about that.
> >
> > At least this one is an optimisation.
> > The advantage of getting objtool to do the change is that objdump will
> > then show the code that is being executed.
>
> Given the amount of self modifying code, that's a dream. Also, on
> anything half recent from Intel, it'll all be rewritten to FineIBT,
> which is wildly different from what objdump will be showing you.
>
> The only way to truly see what's running is to disassemble the live
> image -- either through /proc/kcore or some virtual machine gdb server.

I did have a local change that generated different nop*3 so I could tell
what was lfence, stac, clac (etc).
Trying to check the compiler output was hard when there were blocks of
6 nop.

David