Re: [PATCH 1/6] KVM: Document KVM_PRE_FAULT_MEMORY ioctl

From: Isaku Yamahata
Date: Mon Apr 22 2024 - 13:55:49 EST


On Fri, Apr 19, 2024 at 04:59:22AM -0400,
Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:

> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> Adds documentation of KVM_PRE_FAULT_MEMORY ioctl. [1]
>
> It populates guest memory. It doesn't do extra operations on the
> underlying technology-specific initialization [2]. For example,
> CoCo-related operations won't be performed. Concretely for TDX, this API
> won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific APIs
> are required for such operations.
>
> The key point is to adapt of vcpu ioctl instead of VM ioctl. First,
> populating guest memory requires vcpu. If it is VM ioctl, we need to pick
> one vcpu somehow. Secondly, vcpu ioctl allows each vcpu to invoke this
> ioctl in parallel. It helps to scale regarding guest memory size, e.g.,
> hundreds of GB.
>
> [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@xxxxxxxxxx/
> [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@xxxxxxxxxx/
>
> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> Message-ID: <9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@xxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> Documentation/virt/kvm/api.rst | 50 ++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..bbcaa5d2b54b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6352,6 +6352,56 @@ a single guest_memfd file, but the bound ranges must not overlap).
>
> See KVM_SET_USER_MEMORY_REGION2 for additional details.
>
> +4.143 KVM_PRE_FAULT_MEMORY
> +------------------------
> +
> +:Capability: KVM_CAP_PRE_FAULT_MEMORY
> +:Architectures: none
> +:Type: vcpu ioctl
> +:Parameters: struct kvm_pre_fault_memory (in/out)
> +:Returns: 0 on success, < 0 on error
> +
> +Errors:
> +
> + ========== ===============================================================
> + EINVAL The specified `gpa` and `size` were invalid (e.g. not
> + page aligned).
> + ENOENT The specified `gpa` is outside defined memslots.
> + EINTR An unmasked signal is pending and no page was processed.
> + EFAULT The parameter address was invalid.
> + EOPNOTSUPP Mapping memory for a GPA is unsupported by the
> + hypervisor, and/or for the current vCPU state/mode.

EIO Unexpected error happened.

> + ========== ===============================================================
> +
> +::
> +
> + struct kvm_pre_fault_memory {
> + /* in/out */
> + __u64 gpa;
> + __u64 size;
> + /* in */
> + __u64 flags;
> + __u64 padding[5];
> + };
> +
> +KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
> +for the current vCPU state. KVM maps memory as if the vCPU generated a
> +stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
> +CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed.
> +
> +In some cases, multiple vCPUs might share the page tables. In this
> +case, the ioctl can be called in parallel.
> +
> +Shadow page tables cannot support this ioctl because they
> +are indexed by virtual address or nested guest physical address.
> +Calling this ioctl when the guest is using shadow page tables (for
> +example because it is running a nested guest with nested page tables)
> +will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports
> +the capability to be present.
> +
> +`flags` must currently be zero.

`flags` and `padding`

> +
> +
> 5. The kvm_run structure
> ========================
>
> --
> 2.43.0
>
>
>

--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>