Re: [GIT PULL] RISC-V updates for v7.0

From: Deepak Gupta

Date: Thu Feb 26 2026 - 16:04:34 EST

Hi Peter,

Responses inline.

On Thu, Feb 26, 2026 at 02:23:42PM +0100, Peter Zijlstra wrote:

On Wed, Feb 18, 2026 at 05:57:45PM -0800, Deepak Gupta wrote:

x86 doesn't have any equivalent BTI bit in PTEs to mark code pages. IIRC, it
does have mechanism where a bitmap has to be prepared and each entry in bitmap
encodes whether a page is legacy code page (without `endbr64`) or a modern code
page (with `endbr64`). And CPU will consult this bitmap to suppress the fault.

So; all of this is only ever relevant for programs that are mixing CFI
and !CFI code. If a program has no CFI, all good. If a program is all
CFI enabled, also all good.

If it starts mixing things, then you get to be 'creative'.

Now the thing is, if you start to do that you need to deal with both
forward CFI (BTI) and backward CFI (shadow-stack) #CF exceptions. That
bitmap, that can only deal with BTI, but doesn't help with shadow
stack, so its useless.

My proposal was to ignore that whole bitmap; that's dead hardware, never
used. Instead use a software PTE bit, like ARM has, and simply eat the
#CF look at PTE and figure out what to do.

IIRC, arm has hardware PTE bit saying this is a guarded page. That can be kept
in ITLB as part of virt addr translation during instruction fetch. So whenever
indir_call --> target happens, if target translation was already in ITLB, CPU
already knows whether to suppress the fault or not, without going to kernel.

In x86 case, using a software PTE bit would be different. There will be a fault
always and kernel won't be able to make a decision on what to do. It'll need
some delegating authority to make that decision. That delegating authority can
be a signal handler in userspace which may need a bitmap/auxilliary data
structure of sort to make that decision whether target address is a taken target
or should not be taken.

So decision point is either

- do a software bitmap or
- hardware bitmap (legacy interworking bitmap)
(both will be slow).

OR

Just don't allow/support that configuration to enable CFI. And put onus on
workload owner to do the work to enable the feature.

Sidenote: I wish we were able to convince someone certain in Redmond to give a
sw bit back and this all would have been nicer. Given there wasn't a lot of
traction from open source for this feature, it was mostly a redmond driven
feature.

Yes, this is 'slow', but my claim is that this doesn't matter. There are
2 ways out of this slow-ness:

- fully disable CFI for your program (probably not the thing you want,
but a quick fix, and not really less secure than partial CFI anyway).

- fully enable CFI for your program (might be a bit of work).

The whole mixed thing is a transition state where userspace doesn't have
its ducks in a row. It will go away.

I have spent 8 years defining features to kill class of low-level exploits back
at Intel. And then next 6 years in places where software is deployed on these
CPUs.
I am a security engineer and would have loved to get these features enabled.
But in all honesty, I am yet to see anyone at these places (hyperscalars)
willing to give up an ounce of perf budget (1-2% demands discussion and strong
justification) for enabling just the shadow stack feature.

So my advise would be not to care about enabling path where there is a perf hit.

Keep it simple
- Enable when all binaries have feature awareness.
- Disable when there is one binary with no feature awareness.