Re: hv_hypercall_pg page permissios

From: Peter Zijlstra
Date: Tue Jun 16 2020 - 06:18:40 EST


On Tue, Jun 16, 2020 at 09:23:18AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 15, 2020 at 07:49:41PM +0000, Dexuan Cui wrote:
> > I did this experiment:
> > 1. export vmalloc_exec and ptdump_walk_pgd_level_checkwx.
> > 2. write a test module that calls them.
> > 3. It turns out that every call of vmalloc_exec() triggers such a warning.
> >
> > vmalloc_exec() uses PAGE_KERNEL_EXEC, which is defined as
> > (__PP|__RW| 0|___A| 0|___D| 0|___G)
> >
> > It looks the logic in note_page() is: for_each_RW_page, if the NX bit is unset,
> > then report the page as an insecure W+X mapping. IMO this explains the
> > warning?
>
> It does. But it also means every other user of PAGE_KERNEL_EXEC
> should trigger this, of which there are a few (kexec, tboot, hibernate,
> early xen pv mapping, early SEV identity mapping)

There are only 3 users in the entire tree afaict:

arch/arm64/kernel/probes/kprobes.c: page = vmalloc_exec(PAGE_SIZE);
arch/x86/hyperv/hv_init.c: hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
kernel/module.c: return vmalloc_exec(size);

And that last one is a weak function that any arch that has STRICT_RWX
ought to override.

> We really shouldn't create mappings like this by default. Either we
> need to flip PAGE_KERNEL_EXEC itself based on the needs of the above
> users, or add another define to overload vmalloc_exec as there is no
> other user of that for x86.

We really should get rid of the two !module users of this though; both
x86 and arm64 have STRICT_RWX and sufficient primitives to DTRT.

What is HV even trying to do with that page? AFAICT it never actually
writes to it, it seens to give the physica address to an MSR (which I
suspect then writes crud into the page for us from host context).

Suggesting the page really only needs to be RX.

On top of that, vmalloc_exec() gets us a page from the entire vmalloc
range, which can be outside of the 2G executable range, which seems to
suggest vmalloc_exec() is wrong too and all this works by accident.

How about something like this:


diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a54c6a401581..82a3a4a9481f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -375,12 +375,15 @@ void __init hyperv_init(void)
guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);

- hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
+ hv_hypercall_pg = module_alloc(PAGE_SIZE);
if (hv_hypercall_pg == NULL) {
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
goto remove_cpuhp_state;
}

+ set_memory_ro((unsigned long)hv_hypercall_pg, 1);
+ set_memory_x((unsigned long)hv_hypercall_pg, 1);
+
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
hypercall_msr.enable = 1;
hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);