Re: the usage of __SYSCALL_MASK in entry_SYSCALL_64/do_syscall_64 is not consistent

From: Kees Cook
Date: Tue Jun 21 2016 - 15:01:44 EST


On Mon, Jun 20, 2016 at 10:53 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 06/19, Andy Lutomirski wrote:
>>
>> Something's clearly buggy there,
>
> The usage of __X32_SYSCALL_BIT doesn't look right too. Nothing serious
> but still.
>
> Damn, initially I thought I have found the serious bug in entry_64.S
> and it took me some time to understand why my exploit doesn't work ;)
> So I learned that
>
> andl $__SYSCALL_MASK, %eax
>
> in entry_SYSCALL_64_fastpath() zero-extends %rax and thus
>
> cmpl $__NR_syscall_max, %eax
> ...
> call *sys_call_table(, %rax, 8)
>
> is correct (rax <= __NR_syscall_max).
>
> OK, so entry_64.S simply "ignores" the upper bits if CONFIG_X86_X32_ABI.
> Fine, but this doesn't match the
>
> if (likely((nr & __SYSCALL_MASK) < NR_syscalls))
>
> check in do_syscall_64(). So this test-case
>
> #include <stdio.h>
>
> int main(void)
> {
> // __NR_exit == 0x3c
> asm volatile ("movq $0xFFFFFFFF0000003c, %rax; syscall");
>
> printf("I didn't exit because I am traced\n");
>
> return 0;
> }
>
> silently exits if not traced, otherwise it calls printf().
>
> Should we do something or we do not care?

The slow path has seccomp, so there's no filter bypass with this. I
think it should get corrected, just for proper behavior, but it
currently looks harmless. It does, technically, double the attack
surface for userspace ROPish attacks since now the top half of the
register can be F instead of 0, but that's probably not a very big
deal.

-Kees

--
Kees Cook
Chrome OS & Brillo Security