Re: [PATCH urgent v2] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set

From: Denys Vlasenko
Date: Thu Apr 02 2015 - 08:59:51 EST

On 04/02/2015 02:31 PM, Ingo Molnar wrote:
> * Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>> On 04/02/2015 01:14 PM, Brian Gerst wrote:
>>>>>> So I merged this as it's an obvious bugfix, but in hindsight I'm
>>>>>> really uneasy about the whole opportunistic SYSRET concept: it appears
>>>>>> that the chance that %rcx matches return-%rip is astronomical - this
>>>>>> is why this bug wasn't noticed live so far.
>>>>>> So should we really be doing this?
>>>>> Andy does this not for the off-chance that userspace's RCX is equal
>>>>> to return address and R11 == RFLAGS. The chances of that are
>>>>> astronomically small.
>>>>> This code path triggers when ptrace/audit/seccomp is active. Instead
>>>>> of torturing ourselves trying to not divert into IRET return, now
>>>>> code is steered that way. But then immediately before actual IRET,
>>>>> we check again: "do we really need IRET?" IOW "did ptrace really
>>>>> touch pt_regs->ss? ->flags? ->rip? ->rcx?" which in vast majority of
>>>>> cases will not be true.
>>>> I keep forgetting about that, my test systems have the audit muck
>>>> turned off ;-)
>>>> Fair enough - and it's sensible to share the IRET path between
>>>> interrupts and complex-return system calls, even though the check
>>>> is unnecessary overhead for the pure interrupt return path...
>>> Maybe we could reintroduce TIF_IRET for this purpose instead of
>>> (ab)using TIF_NOTIFY_RESUME. Then we would only do the opportunistic
>>> check for those cases (ptrace, audit, exec, sigreturn, etc.), and skip
>>> it for interrupts.
>> The very first check in the existing code, pt_regs->cx ==
>> pt_regs->ip, will fail for interrupt returns.
>> You hardly can save anything by placing a (ti->flags &
>> TIF_TRY_SYSRET) check in front of it, it's almost as expensive.
> Well, what I was thinking of was to have a pure irq (well, async
> context) return path, not shared with the weird-syscall-IRET return
> path at all ...
> It would be open coded, not obfuscated via macros.
> That way AFAICS the upsides are:
> - it's easier to read (and maintain) what goes on in which case.
> '*intr*' labels would truly identify interrupt return related
> processing, for a change!

Re labels: I fully agree they need cleanup (mass rename).

Something along the lines of

int_ret_from_sys_call -> return_from_syscall
int_with_check -> sysret_check_workmask_in_edi
int_careful -> sysret_check_NEED_RESCHED
int_very_careful -> sysret_check_SYSCALL_EXIT
int_signal -> sysret_check_DO_NOTIFY_MASK
int_restore_rest -> sysret_next_check

ret_from_intr -> return_from_intr
retint_with_reschedule -> intr_check_WORK_MASK
retint_check -> intr_check_workmask_in_edi
retint_careful -> intr_check_NEED_RESCHED
retint_signal -> intr_check_DO_NOTIFY_MASK

retint_swapgs -> return_from_syscall_or_intr
irq_return_via_sysret -> return_via_sysret

retint_kernel -> intr_check_preempt
restore_args -> restore_c_regs
irq_return -> return_via_iret

and then your proposal can be rephrased as "let's stop
merging sysret and intr code paths at retint_swapgs".

Makes sense. It would entail some code duplication,
but the code will be easier to maintain.

> - we can optimize in a more directed fashion - like here
> ... while the downsides are:
> - more code
> - a (small) chance of a fix going to one path while not the other.
> How much extra code would it be?

A screenful or two.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at