Re: [PATCH] x86/asm/entry/64: better check for canonical address

From: Denys Vlasenko
Date: Fri Mar 27 2015 - 06:45:17 EST


On 03/27/2015 09:11 AM, Ingo Molnar wrote:
>
> * Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>
>> This change makes the check exact (no more false positives
>> on kernel addresses).
>>
>> It isn't really important to be fully correct here -
>> almost all addresses we'll ever see will be userspace ones,
>> but OTOH it looks to be cheap enough:
>> the new code uses two more ALU ops but preserves %rcx,
>> allowing to not reload it from pt_regs->cx again.
>> On disassembly level, the changes are:
>>
>> cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
>> shr $0x2f,%rcx -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
>> mov 0x58(%rsp),%rcx -> (eliminated)
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
>> CC: Borislav Petkov <bp@xxxxxxxxx>
>> CC: x86@xxxxxxxxxx
>> CC: linux-kernel@xxxxxxxxxxxxxxx
>> ---
>>
>> Andy, I'd undecided myself on the merits of doing this.
>> If you like it, feel free to take it in your tree.
>> I trimmed CC list to not bother too many people with this trivial
>> and quite possibly "useless churn"-class change.
>>
>> arch/x86/kernel/entry_64.S | 23 ++++++++++++-----------
>> 1 file changed, 12 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index bf9afad..a36d04d 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -688,26 +688,27 @@ retint_swapgs: /* return to user-space */
>> * a completely clean 64-bit userspace context.
>> */
>> movq RCX(%rsp),%rcx
>> - cmpq %rcx,RIP(%rsp) /* RCX == RIP */
>> + movq RIP(%rsp),%r11
>> + cmpq %rcx,%r11 /* RCX == RIP */
>> jne opportunistic_sysret_failed
>
> Btw., in the normal syscall entry path, RIP(%rsp) == RCX(%rsp),
> because we set up pt_regs like that - and at this point RIP/RCX is
> guaranteed to be canonical, right?
>
> So if there's a mismatch generated, it's the kernel's doing.

This is an optimization on IRET exit code path.

We go here if we know that pt_regs can be modified by .e.g. ptrace.

I think we also go here even on interrupt return.
(Granted, chances that RCX was the same as RIP at the moment of interrupt
are slim, but we still would check that and (ab)use SYSRET
if it looks like it'll work).


> Why don't we detect those cases where a new return address is created
> (ptrace, exec, etc.), check for canonicalness and add a TIF flag for
> it (and add it to the work mask) and execute the IRET from the slow
> path?
>
> We already have a work-mask branch.
>
> That would allow the removal of all these checks and canonization from
> the fast return path! We could go straight to the SYSRET...

The point is, this is not a fast return path.

It's a "let's try to use fast SYSRET instead of IRET" path.


> The frequency of exec() and ptrace() is 2-3 orders of magnitude lower
> than the frequency of system calls, so this would be well worth it.

On untraced system calls, we don't come here. We go to SYSRET
without these checks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/