[RFC PATCH 0/2] KVM: x86: Relay a nested Hyper-V root's vmbus posts to L0

From: Robert Nowotny

Date: Wed Jun 17 2026 - 11:12:25 EST


This RFC asks for direction on a small KVM/x86 addition before adding a selftest
and an SVM counterpart. It lets a nested Hyper-V root partition's vmbus come up
when the L1 hypervisor runs under KVM with a userspace VMM that owns the host
vmbus endpoint.

Patch 1 renames nested_evmcs_l2_tlb_flush_enabled() to
nested_evmcs_l2_direct_hypercall_enabled(), since the predicate is really "L1
granted this L2 the eVMCS direct-hypercall facility" and a second caller now
shares it. No functional change.

Patch 2 adds the relay.

The userspace user is OpenVMM (https://github.com/microsoft/openvmm); the
companion change that enables this capability with the bitmask will be posted to
OpenVMM later.

Problem
-------
A Windows guest that enables Hyper-V/VBS runs its own kernel as the root
partition of a nested hypervisor, i.e. as an L2 guest: guest kernel ->
nested hypervisor (L1) -> KVM (L0). The root's vmbus never connects. Its
HvPostMessage(InitiateContact) is an L2 VMCALL that exits to L0 and is
reflected up to L1, which has no path to forward it to the userspace VMM. The
guest bugchecks 0x7B early in boot.

What the patch does
-------------------
Add a per-VM capability whose argument is a bitmask of the nested Hyper-V
hypercall classes userspace wants kept in L0 (HvPostMessage, HvSignalEvent).
For a selected class, and when L1 has authorized the L2 for direct nested
hypercalls (nested_evmcs_l2_direct_hypercall_enabled(), the gate KVM already
honors for the L2 TLB-flush hypercall), the L2 VMCALL is handled in L0 instead
of reflected to L1: KVM clears the nested bit, translates the L2 GPA in the
input parameter to an L1 GPA via the nested MMU, and lets the existing
hypercall path deliver the post to userspace via KVM_EXIT_HYPERV, exactly as
for a non-nested guest.

Why this belongs in the kernel
------------------------------
The message handling already lives in userspace and does not move: a non-nested
HvPostMessage exits to userspace today via KVM_EXIT_HYPERV, and the relayed
nested post takes the same exit. Only two steps cannot be done in userspace with
the current uAPI, and both are kernel-only primitives:

  1. Suppressing nested exit reflection. The "keep this L2 VMCALL in L0 instead
     of reflecting to L1" decision is made in nested_vmx_reflect_vmexit(); KVM
     does not exit to userspace on a nested L2 VM-exit before deciding
     reflection, and adding such an exit would be a much broader and riskier
     ABI. A nested exit also cannot be cleanly reflected to L1 after a userspace
     round-trip, which is why the decision stays in the kernel.
  2. Translating the L2 GPA to an L1 GPA, which needs the nested MMU / shadow
     EPT that userspace cannot walk.

The relayable set is a userspace-supplied bitmask
-------------------------------------------------
args[0] selects which nested Hyper-V hypercall classes to keep in L0. The
in-kernel decision stays in the kernel, the choice of which calls to relay is
userspace's, and the kernel carries no vmbus-specific policy. New relayable
nested hypercalls can be added without another kernel change.

Scope and limitations
---------------------
  - VMX-only; no SVM counterpart yet.
  - The capability number 249 is a placeholder pending assignment.
  - No selftest yet (this is an RFC for direction). A selftest and, if the
    relay stays, an SVM path would come with the non-RFC series.

Tooling transparency
--------------------
This work was developed with AI assistance (Claude, claude-opus-4-8), reflected
in each patch's Assisted-by tag. The assistant analyzed the nested-exit
reflection and Hyper-V hypercall paths, drafted the comments and changelogs, and
cross-checked the behavior against the TLFS and the existing L2 TLB-flush
handling. The mechanism was derived from runtime analysis of a stock Windows
guest that bugchecks 0x7B without the relay and boots with it. The submitter has
reviewed the change in full and takes responsibility for it.

Testing
-------
The relay mechanism was validated on a Proxmox VE 7.0.2 kernel (the same logic,
applied to that tree): a stock nested Windows guest under a userspace VMM that
owns the host vmbus endpoint fails to bring up its root vmbus (0x7B) without the
capability and boots to the full desktop with it. checkpatch is clean on both
patches. A mainline KVM_INTEL=m KVM_AMD=m KVM_WERROR=y build and a KVM selftest
are still to come with the non-RFC series.


Yours sincerely

Ing. Robert Nowotny
Ing. Robert Nowotny
CTO, Executive Technical Director

Rotek GmbH

------------------------------------------------------------------
Company Information :
Rotek Handels GmbH
Handelsstrasse 4
A-2201 Hagenbrunn
Austria

Tel : +43-2246-20791-23
Fax : +43-2246-20791-50

Executive Director: Robert Rernböck
Registered under : FN271982z, Landesgericht Korneuburg
VAT Number : ATU62139135
------------------------------------------------------------------
CONTACT:
mailto: rnowotny@xxxxxxxx
Web: https://www.rotek.at
------------------------------------------------------------------

Attachment: smime.p7s
Description: Kryptografische S/MIME-Signatur