Re: [RFC 2/2] x86/pti/64: Remove the SYSCALL64 entry trampoline

From: Linus Torvalds
Date: Sun Jul 22 2018 - 14:28:14 EST


On Sun, Jul 22, 2018 at 10:45 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> This patch changes the code to map the percpu TSS into the user page
> tables to allow the non-trampoline SYSCALL64 path to work under PTI.

Me likey.

However:

> This does not add a new direct information leak, since the TSS is
> readable by Meltdown from the cpu_entry_area alias regardless.

Afaik, it does now potentially expose through meltdown the per-thread
entry stack info, which is new.

But I don't think that's a show-stopper.

> static void __init pti_clone_user_shared(void)
> {
> + for_each_possible_cpu(cpu) {

But this code is pretty disgusting and seems wrong.

Do you really want to do all trhe _possible_ cpu's, not just the
online ones? I'd rather expose less (think MAXCPU) and then have the
CPU hotplug code expose the page as the CPU comes up?

> + unsigned long va = (unsigned long)&per_cpu(cpu_tss_rw, cpu);
> + phys_addr_t pa = per_cpu_ptr_to_phys((void *)va);
> + pte_t *target_pte;
> +
> + target_pte = pti_user_pagetable_walk_pte(va);

This function only exists if CONFIG_X86_VSYSCALL_EMULATION, so it
won't even compile under (very unusual) configurations.

The "disgusting" part is that I think it could/should share more code
with the vsyscall case, and the whole target-pte checking and setting
should be shared too.

Beause not being shared, I react to this:

> + set_pte(target_pte, pfn_pte(pa >> PAGE_SHIFT, PAGE_KERNEL));

Hmm. The vsyscall code just does

*target_pte = ..

without any set_pte() stuff. Do we want/need the PVOP cases, and if
so, why doesn't the vsyscall case need it?

Anyway, I love the approach, and how this gets rid of the nasty
trampoline, so no real complaints, just "this needs some fixups".

Linus