Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems

From: Xin Li

Date: Wed Apr 01 2026 - 11:16:28 EST



Thanks!
Xin

> On Mar 31, 2026, at 8:15 PM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@xxxxxxxxx> wrote:
>>
>>
>>>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@xxxxxxxxx> wrote:
>>>
>>>
>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>>
>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>> to fail.
>>>>>>>
>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>> ERETU) preserves them.
>>>>>>>
>>>>>>
>>>>>> I don't really like this. I think we have two credible choices:
>>>>>>
>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>> R11 and RCX on entry and exit. And update the test to actually test
>>>>>> this.
>>>>>>
>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>> preserves all registers.
>>>>>>
>>>>>> I'm in favor of #2. People love making new programming languages and
>>>>>> runtimes and inline asm and, these days, vibe coded crap. And it's
>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>> they're clobbered. And it's easy to test on FRED (well, not really,
>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>> crashes sometimes on non-FRED systems. And it will be miserable to
>>>>>> debug.
>>>>>>
>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>> function calls, so one can get into a situation in which one's
>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>> these registers until an inlining decision changes or some code gets
>>>>>> reordered, and then it will start failing. And making the failure
>>>>>> depend on hardware details is just nasty.
>>>>>>
>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>> on FRED to match non-FRED.
>>>>>
>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>> FRED systems is by far the safest choice.
>>>>>
>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>> userspace state between machines might be 'surprised'.
>>>>
>>>> Thanks Andy and Peter.
>>>>
>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>> is not a good practice. The selftest should validate ABI consistency.
>>>>
>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>> syscall entry implementation.
>>>>
>>>> Li Xin, does this direction look right to you? I can assit with
>>>> validation and keep the selftest aligned with the agreed ABI.
>>>>
>>>
>>> Yes, consistency should take precedence over hardware-specific variations.
>>>
>>> I would like to hear from Andrew Cooper and hpa before we do it.
>>
>> Per Andy’s suggestion, the change would be:
>>
>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>> index 88c757ac8ccd..a19898747a2c 100644
>> --- a/arch/x86/entry/entry_fred.c
>> +++ b/arch/x86/entry/entry_fred.c
>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>> {
>> /* The compiler can fold these conditions into a single test */
>> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>> + regs->cx = regs->ip;
>> + regs->r11 = regs->flags;
>> +
>> regs->orig_ax = regs->ax;
>> regs->ax = -ENOSYS;
>> do_syscall_64(regs, regs->orig_ax);
>>
>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>
> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?

Yes, that is technically cleaner.

The question is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?

I think Andy and Peter want to be on the safer side, which kind of assumes that this is established.