Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test

From: Andy Lutomirski
Date: Thu Sep 10 2015 - 15:18:00 EST


On Thu, Sep 10, 2015 at 12:04 PM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
> On 09/10/2015 12:01 AM, Andy Lutomirski wrote:
>> On Mon, Sep 7, 2015 at 8:56 AM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>>> This new test checks that all x86 registers are preserved across
>>> 32-bit syscalls. It tests syscalls through VDSO (if available)
>>> and through INT 0x80, normally and under ptrace.
>>>
>>> If kernel is a 64-bit one, high registers (r8..r15) are poisoned
>>> before the syscall is called and are checked afterwards.
>>>
>>> They must be either preserved, or cleared to zero (but r11 is special);
>>> r12..15 must be preserved for INT 0x80.
>>>
>>> EFLAGS is checked for changes too, but change there is not
>>> considered to be a bug (paravirt kernels do not preserve
>>> arithmetic flags).
>>>
>>> Run-tested on 64-bit kernel:
>>>
>>> $ ./test_syscall_vdso_32
>>> [RUN] Executing 6-argument 32-bit syscall via VDSO
>>> [OK] Arguments are preserved across syscall
>>> [NOTE] R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
>>> [OK] R8..R15 did not leak kernel data
>>> [RUN] Executing 6-argument 32-bit syscall via INT 80
>>> [OK] Arguments are preserved across syscall
>>> [OK] R8..R15 did not leak kernel data
>>> [RUN] Running tests under ptrace
>>> [RUN] Executing 6-argument 32-bit syscall via VDSO
>>> [OK] Arguments are preserved across syscall
>>> [OK] R8..R15 did not leak kernel data
>>> [RUN] Executing 6-argument 32-bit syscall via INT 80
>>> [OK] Arguments are preserved across syscall
>>> [OK] R8..R15 did not leak kernel data
>>>
>>> On 32-bit paravirt kernel:
>>>
>>> $ ./test_syscall_vdso_32
>>> [NOTE] Not a 64-bit kernel, won't test R8..R15 leaks
>>> [RUN] Executing 6-argument 32-bit syscall via VDSO
>>> [WARN] Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
>>> [WARN] Flags after=0000000000200246 id 0 00 i z 0 0 p 1
>>> [WARN] Flags change=0000000000000c91 0 00 o d s 0 a 0 0 c
>>> [OK] Arguments are preserved across syscall
>>> [RUN] Executing 6-argument 32-bit syscall via INT 80
>>> [OK] Arguments are preserved across syscall
>>> [RUN] Running tests under ptrace
>>> [RUN] Executing 6-argument 32-bit syscall via VDSO
>>> [OK] Arguments are preserved across syscall
>>> [RUN] Executing 6-argument 32-bit syscall via INT 80
>>> [OK] Arguments are preserved across syscall
>>>
>>> Signed-off-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
>>> CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>> CC: Steven Rostedt <rostedt@xxxxxxxxxxx>
>>> CC: Ingo Molnar <mingo@xxxxxxxxxx>
>>> CC: Borislav Petkov <bp@xxxxxxxxx>
>>> CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
>>> CC: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>> CC: Oleg Nesterov <oleg@xxxxxxxxxx>
>>> CC: Frederic Weisbecker <fweisbec@xxxxxxxxx>
>>> CC: Alexei Starovoitov <ast@xxxxxxxxxxxx>
>>> CC: Will Drewry <wad@xxxxxxxxxxxx>
>>> CC: Kees Cook <keescook@xxxxxxxxxxxx>
>>> CC: x86@xxxxxxxxxx
>>> CC: linux-kernel@xxxxxxxxxxxxxxx
>>
>> Acked-by: Andy Lutomirski <luto@xxxxxxxxxx>
>>
>> with minor caveats below, none of which are show-stoppers...
>>
>>> + /* INT80 syscall entrypoint can be used by
>>> + * 64-bit programs too, unlike SYSCALL/SYSENTER.
>>> + * Therefore it must preserve R12+
>>> + * (they are callee-saved registers in 64-bit C ABI).
>>> + *
>>> + * This was probably historically not intended,
>>> + * but R8..11 are clobbered (cleared to 0).
>>> + * IOW: they are the only registers which aren't
>>> + * preserved across INT80 syscall.
>>> + */
>>> + if (*r64 == 0 && num <= 11)
>>> + continue;
>>
>> Ugh. I'll change my big entry patchset to preserve these and maybe to
>> preserve all of the 64-bit regs.
>
> If you do that, this won't change the ABI: we don't _promise_
> to save them. If we accidentally do, that means nothing.
>
> If you do that, the test won't fail. The code above does
> not require regs to be 0 - there is further code which
> also allow them to be unchanged.
>
> (I'm not very comfortable about additional six push/pops
> which are necessary for this to happen. I'm surprised
> maintainers tentatively agreed to that -
> I was grilled and asked to prove with measurements
> that *one* additional push+pop wasn't adding significant overhead).

I suspect that I need to make the series faster.

Also, int $0x80 isn't a fast path for any legitimate use case except
Debian, and I would argue that Debian is just buggy.

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/