RE: [RFC PATCH 00/18] Microsoft Hypervisor root partition ioctl interface

From: Michael Kelley
Date: Mon Feb 08 2021 - 15:54:35 EST


From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Friday, November 20, 2020 4:30 PM
>
> This patch series provides a userspace interface for creating and running guest
> virtual machines while running on the Microsoft Hypervisor [0].
>
> Since managing guest machines can only be done when Linux is the root partition,
> this series depends on the RFC already posted by Wei Liu:
> https://lore.kernel.org/linux-hyperv/20201105165814.29233-1-wei.liu@xxxxxxxxxx/T/#t
>
> The first two patches provide some helpers for converting hypervisor status
> codes to linux error codes, and easily printing hypervisor status codes to dmesg
> for debugging.
>
> Hyper-V related headers asm-generic/hyperv-tlfs.h and x86/asm/hyperv-tlfs.h are
> split into uapi and non-uapi. The uapi versions contain structures used in both
> the ioctl interface and the kernel.
>
> The mshv API is introduced in virt/mshv/mshv_main.c. As each interface is
> introduced, documentation is added in Documentation/virt/mshv/api.rst.
> The API is file-desciptor based, like KVM. The entry point is /dev/mshv.
>
> /dev/mshv ioctls:
> MSHV_REQUEST_VERSION
> MSHV_CREATE_PARTITION
>
> Partition (vm) ioctls:
> MSHV_MAP_GUEST_MEMORY, MSHV_UNMAP_GUEST_MEMORY
> MSHV_INSTALL_INTERCEPT
> MSHV_ASSERT_INTERRUPT
> MSHV_GET_PARTITION_PROPERTY, MSHV_SET_PARTITION_PROPERTY
> MSHV_CREATE_VP
>
> Vp (vcpu) ioctls:
> MSHV_GET_VP_REGISTERS, MSHV_SET_VP_REGISTERS
> MSHV_RUN_VP
> MSHV_GET_VP_STATE, MSHV_SET_VP_STATE
> mmap() (register page)
>
> [0] Hyper-V is more well-known, but it really refers to the whole stack
> including the hypervisor and other components that run in Windows kernel
> and userspace.
>
> Nuno Das Neves (18):
> x86/hyperv: convert hyperv statuses to linux error codes
> asm-generic/hyperv: convert hyperv statuses to strings
> virt/mshv: minimal mshv module (/dev/mshv/)
> virt/mshv: request version ioctl
> virt/mshv: create partition ioctl
> virt/mshv: create, initialize, finalize, delete partition hypercalls
> virt/mshv: withdraw memory hypercall
> virt/mshv: map and unmap guest memory
> virt/mshv: create vcpu ioctl
> virt/mshv: get and set vcpu registers ioctls
> virt/mshv: set up synic pages for intercept messages
> virt/mshv: run vp ioctl and isr
> virt/mshv: install intercept ioctl
> virt/mshv: assert interrupt ioctl
> virt/mshv: get and set vp state ioctls
> virt/mshv: mmap vp register page
> virt/mshv: get and set partition property ioctls
> virt/mshv: Add enlightenment bits to create partition ioctl
>
> .../userspace-api/ioctl/ioctl-number.rst | 2 +
> Documentation/virt/mshv/api.rst | 173 ++
> arch/x86/Kconfig | 2 +
> arch/x86/hyperv/Kconfig | 22 +
> arch/x86/hyperv/Makefile | 4 +
> arch/x86/hyperv/hv_init.c | 2 +-
> arch/x86/hyperv/hv_proc.c | 40 +-
> arch/x86/include/asm/hyperv-tlfs.h | 44 +-
> arch/x86/include/asm/mshyperv.h | 1 +
> arch/x86/include/uapi/asm/hyperv-tlfs.h | 1312 +++++++++++
> arch/x86/kernel/cpu/mshyperv.c | 16 +
> include/asm-generic/hyperv-tlfs.h | 324 ++-
> include/asm-generic/mshyperv.h | 3 +
> include/linux/mshv.h | 61 +
> include/uapi/asm-generic/hyperv-tlfs.h | 160 ++
> include/uapi/linux/mshv.h | 109 +
> virt/mshv/mshv_main.c | 2054 +++++++++++++++++
> 17 files changed, 4178 insertions(+), 151 deletions(-)
> create mode 100644 Documentation/virt/mshv/api.rst
> create mode 100644 arch/x86/hyperv/Kconfig
> create mode 100644 arch/x86/include/uapi/asm/hyperv-tlfs.h
> create mode 100644 include/linux/mshv.h
> create mode 100644 include/uapi/asm-generic/hyperv-tlfs.h
> create mode 100644 include/uapi/linux/mshv.h
> create mode 100644 virt/mshv/mshv_main.c
>
> --
> 2.25.1

I finally made it through reviewing this patch series. Nice
work -- to you, and to Lillian as the original author of significant
portions! There's a lot code, but it is well organized for reviewing
and overall is done well.

I have a three general comments:

1) Historically we have very precisely specified the layout of data
structures that are shared with Hyper-V. Each field has an explicit
width (i.e., u16, u32, u64, etc.) and we have avoided field types that
lack an explicit width (int, enum, bool, etc.). These patches make
liberal use of enum types in the Hyper-V data structures, and I saw
one occurrence of bool. While treating enum and bool as 32 bits
works, I have a concern that such specifications aren't consistent
with the original rigor we tried to use.

Related, there are several places where the proper layout depends
on the compiler inserting padding (and not inserting padding in the
wrong places) to achieve the needed alignment. In my view, we
should be explicitly adding the padding. A couple years back at
Vitaly Kuznetsov's initiative, we added __packed on all the data
structures to instruct the compiler to not add padding, so as to
prevent padding being added at any inappropriate places.

I started by flagging all of these places I saw either of these two
Issues, but I stopped doing so in some of the later patches, figuring
that you could find the issues across the entire series.

2) With all the new hypercalls added with this patch series, and with
Wei Liu's patch series for Linux in the root partition, I've noticed that
we're inconsistent in how the hypercall status is checked. The
current code works, but is sloppy with types and doesn't always
conform to the letter of the TLFS. Your new hv_status_to_errno() is
a nice addition, but I think we would be well served by using a
consistent pattern. I'm planning to send out a separate email to
the linux-hyperv mailing list with a specific suggestion that we can
all review and comment on. Once we have agreement, we can do
a cleanup exercise on existing code and on recent patches.

3) I've flagged a few places where the code does not handle configurations
where PAGE_SIZE is other than 4 Kbytes. While this will never happen
on x86/x64, it could happen on other architectures like ARM64. Of course,
we may never want to run Linux in the root partition with a page size
other than 4 Kbytes, even on ARM64, so I'm OK with not fixing all these
places. But I've flagged some places where HV_HYP_PAGE_SIZE would
be more appropriate than PAGE_SIZE (and similar) and I think it makes
sense to fix those now, if just to express that the usage is tied to the
page size used by the Hyper-V interface, and not the guest page size.

I'll also send replies to many of the individual patches with specific
comments embedded. I have not given "Reviewed-by:" on any of the
patches since they were submitted as RFC, but I can do so for a few
of the patches if that would be helpful.

Michael