RE: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush

From: KY Srinivasan
Date: Mon Apr 10 2017 - 18:03:25 EST




> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> Sent: Monday, April 10, 2017 7:44 AM
> To: KY Srinivasan <kys@xxxxxxxxxxxxx>
> Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; x86@xxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>;
> Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>; Thomas Gleixner
> <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; H. Peter Anvin
> <hpa@xxxxxxxxx>; Steven Rostedt <rostedt@xxxxxxxxxxx>; Jork Loeser
> <Jork.Loeser@xxxxxxxxxxxxx>
> Subject: Re: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
>
> KY Srinivasan <kys@xxxxxxxxxxxxx> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> >> Sent: Friday, April 7, 2017 4:27 AM
> >> To: devel@xxxxxxxxxxxxxxxxxxxxxx; x86@xxxxxxxxxx
> >> Cc: linux-kernel@xxxxxxxxxxxxxxx; KY Srinivasan <kys@xxxxxxxxxxxxx>;
> >> Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; Stephen Hemminger
> >> <sthemmin@xxxxxxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>;
> Ingo
> >> Molnar <mingo@xxxxxxxxxx>; H. Peter Anvin <hpa@xxxxxxxxx>; Steven
> >> Rostedt <rostedt@xxxxxxxxxxx>; Jork Loeser
> <Jork.Loeser@xxxxxxxxxxxxx>
> >> Subject: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush
> >>
> >> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
> >> this is supposed to work faster than IPIs.
> >>
> >> Implementation details: to do HvFlushVirtualAddress{Space,List}
> hypercalls
> >> we need to put the input somewhere in memory and we don't really
> want to
> >> have memory allocation on each call so we pre-allocate per cpu memory
> >> areas
> >> on boot. These areas are of fixes size, limit them with an arbitrary number
> >> of 16 (16 gvas are able to specify 16 * 4096 pages).
> >>
> >> pv_ops patching is happening very early so we need to separate
> >> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
> >>
> >> It is possible and easy to implement local TLB flushing too and there is
> >> even a hint for that. However, I don't see a room for optimization on the
> >> host side as both hypercall and native tlb flush will result in vmexit. The
> >> hint is also not set on modern Hyper-V versions.
> >>
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> >> ---
> >> arch/x86/hyperv/Makefile | 2 +-
> >> arch/x86/hyperv/hv_init.c | 2 +
> >> arch/x86/hyperv/mmu.c | 128
> >> +++++++++++++++++++++++++++++++++++++
> >> arch/x86/include/asm/mshyperv.h | 2 +
> >> arch/x86/include/uapi/asm/hyperv.h | 7 ++
> >> arch/x86/kernel/cpu/mshyperv.c | 1 +
> >> 6 files changed, 141 insertions(+), 1 deletion(-)
> >> create mode 100644 arch/x86/hyperv/mmu.c
> >>
> >> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> >> index 171ae09..367a820 100644
> >> --- a/arch/x86/hyperv/Makefile
> >> +++ b/arch/x86/hyperv/Makefile
> >> @@ -1 +1 @@
> >> -obj-y := hv_init.o
> >> +obj-y := hv_init.o mmu.o
> >> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> >> index 1c14088..2cf8a98 100644
> >> --- a/arch/x86/hyperv/hv_init.c
> >> +++ b/arch/x86/hyperv/hv_init.c
> >> @@ -163,6 +163,8 @@ void hyperv_init(void)
> >> hypercall_msr.guest_physical_address =
> >> vmalloc_to_pfn(hv_hypercall_pg);
> >> wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> >>
> >> + hyper_alloc_mmu();
> >> +
> >> /*
> >> * Register Hyper-V specific clocksource.
> >> */
> >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> >> new file mode 100644
> >> index 0000000..fb487cb
> >> --- /dev/null
> >> +++ b/arch/x86/hyperv/mmu.c
> >> @@ -0,0 +1,128 @@
> >> +#include <linux/types.h>
> >> +#include <linux/hyperv.h>
> >> +#include <linux/slab.h>
> >> +#include <asm/mshyperv.h>
> >> +#include <asm/tlbflush.h>
> >> +#include <asm/msr.h>
> >> +#include <asm/fpu/api.h>
> >> +
> >> +/*
> >> + * Arbitrary number; we need to pre-allocate per-cpu struct for doing
> TLB
> >> + * flush hypercalls and we need to pick a size. '16' means we'll be able
> >> + * to flush 16 * 4096 pages (256MB) with one hypercall.
> >> + */
> >> +#define HV_MMU_MAX_GVAS 16
> >
> > Did you experiment with different sizes here.
>
> Actually, I was never able to see kernel trying to flush more than 4096
> pages so we can get away with HV_MMU_MAX_GVAS=1. I went through the
> code
> and didn't see any 'limit' for the number of pages we can ask to flush
> so it can be a coincidence. Each addition gva_list item requires 8 bytes
> only so I put and arbitrary '16' here.
>
> >> +
> >> +/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls
> */
> >> +struct hv_flush_pcpu {
> >> + struct {
> >> + __u64 address_space;
> >> + __u64 flags;
> >> + __u64 processor_mask;
> >> + __u64 gva_list[HV_MMU_MAX_GVAS];
> >> + } flush;
> >> +
> >> + spinlock_t lock;
> >> +};
> >> +
> > We may be supporting more than 64 CPUs in this hypercall. I am going to
> inquire with
> > the Windows folks and get back to you.
>
> Thanks! It is even specified in the specification:
> "Future versions of the hypervisor may support more than 64 virtual
> processors per partition. In that
> case, a new field will be added to the flags value that allows the caller to
> define the âprocessor bankâ to
> which the processor mask applies."
>
> We, however, need to know where to put this in flags.

There is a new Hypercall for targeting more than 64 VCPUs. For now, we can check if the CPU mask
Is specifying more than 64 CPUs and use native call if that is the case.

K. Y