Re: [RFC PATCH 2/8] KVM: arm64: Setup base for hypercall firmware registers

From: Oliver Upton
Date: Wed Nov 03 2021 - 18:18:24 EST


On Tue, Nov 02, 2021 at 12:21:57AM +0000, Raghavendra Rao Ananta wrote:
> The hypercall firmware registers may hold versioning information
> for a particular hypercall service. Before a VM starts, these
> registers are read/write to the user-space. That is, it can freely
> modify the fields as it sees fit for the guest. However, this
> shouldn't be allowed once the VM is started since it may confuse
> the guest as it may have read an older value. As a result, introduce
> a helper interface to convert the registers to read-only once any
> vCPU starts running.
>
> Extend this interface to also clear off all the feature bitmaps of
> the firmware registers upon first write. Since KVM exposes an upper
> limit of the feature-set to user-space via these registers, this
> action will ensure that no new features get enabled by accident if
> the user-space isn't aware of a newly added register.
>
> Since the upcoming changes introduces more firmware registers,
> rename the documentation to PSCI (psci.rst) to a more generic
> hypercall.rst.
>
> Signed-off-by: Raghavendra Rao Ananta <rananta@xxxxxxxxxx>
> ---
> .../virt/kvm/arm/{psci.rst => hypercalls.rst} | 24 +++----
> Documentation/virt/kvm/arm/index.rst | 2 +-
> arch/arm64/include/asm/kvm_host.h | 8 +++
> arch/arm64/kvm/arm.c | 7 +++
> arch/arm64/kvm/hypercalls.c | 62 +++++++++++++++++++
> 5 files changed, 90 insertions(+), 13 deletions(-)
> rename Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} (81%)

nit: consider doing the rename in a separate patch.

> diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/hypercalls.rst
> similarity index 81%
> rename from Documentation/virt/kvm/arm/psci.rst
> rename to Documentation/virt/kvm/arm/hypercalls.rst
> index d52c2e83b5b8..85dfd682d811 100644
> --- a/Documentation/virt/kvm/arm/psci.rst
> +++ b/Documentation/virt/kvm/arm/hypercalls.rst
> @@ -1,22 +1,19 @@
> .. SPDX-License-Identifier: GPL-2.0
>
> -=========================================
> -Power State Coordination Interface (PSCI)
> -=========================================
> +=======================
> +ARM Hypercall Interface
> +=======================
>
> -KVM implements the PSCI (Power State Coordination Interface)
> -specification in order to provide services such as CPU on/off, reset
> -and power-off to the guest.
> -
> -The PSCI specification is regularly updated to provide new features,
> -and KVM implements these updates if they make sense from a virtualization
> +New hypercalls are regularly added by ARM specifications (or KVM), and

nit: maybe we should use the abstraction of "hypercall service" to refer
to the functional groups of hypercalls. i.e. PSCI or TRNG are hypercall
services.

> +are made available to the guests if they make sense from a virtualization
> point of view.
>
> This means that a guest booted on two different versions of KVM can
> observe two different "firmware" revisions. This could cause issues if
> -a given guest is tied to a particular PSCI revision (unlikely), or if
> -a migration causes a different PSCI version to be exposed out of the
> -blue to an unsuspecting guest.
> +a given guest is tied to a particular version of a specific hypercall
> +(PSCI revision for instance (unlikely)), or if a migration causes a

a particular version of a hypercall service

> +different (PSCI) version to be exposed out of the blue to an unsuspecting
> +guest.
>
> In order to remedy this situation, KVM exposes a set of "firmware
> pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
> @@ -26,6 +23,9 @@ to a convenient value if required.
> The following register is defined:
>
> * KVM_REG_ARM_PSCI_VERSION:
> + KVM implements the PSCI (Power State Coordination Interface)
> + specification in order to provide services such as CPU on/off, reset
> + and power-off to the guest.
>
> - Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
> (and thus has already been initialized)
> diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
> index 78a9b670aafe..e84848432158 100644
> --- a/Documentation/virt/kvm/arm/index.rst
> +++ b/Documentation/virt/kvm/arm/index.rst
> @@ -8,6 +8,6 @@ ARM
> :maxdepth: 2
>
> hyp-abi
> - psci
> + hypercalls
> pvtime
> ptp_kvm
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index d0221fb69a60..0b2502494a17 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -102,6 +102,11 @@ struct kvm_s2_mmu {
> struct kvm_arch_memory_slot {
> };
>
> +struct hvc_reg_desc {
> + bool write_disabled;
> + bool write_attempted;
> +};
> +
> struct kvm_arch {
> struct kvm_s2_mmu mmu;
>
> @@ -137,6 +142,9 @@ struct kvm_arch {
>
> /* Memory Tagging Extension enabled for the guest */
> bool mte_enabled;
> +
> + /* Hypercall firmware registers' information */
> + struct hvc_reg_desc hvc_desc;
> };
>
> struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 24a1e86d7128..f9a25e439e99 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -630,6 +630,13 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> if (kvm_vm_is_protected(kvm))
> kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
>
> + /* Mark the hypercall firmware registers as read-only since
> + * at least once vCPU is about to start running.
> + */
> + mutex_lock(&kvm->lock);
> + kvm->arch.hvc_desc.write_disabled = true;
> + mutex_unlock(&kvm->lock);
> +

This really is just an alias for if any vCPU in the VM has started yet.
While the ARM KVM code does some bookkeeping around which vCPUs have
been started, it is in no way specific to ARM.

It might be nice to hoist vcpu->arch.has_run_once into the generic KVM
code, then build some nice abstractions there to easily determine if any
vCPU in the VM has been started yet.

> return ret;
> }
>
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index d030939c5929..7e873206a05b 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -58,6 +58,12 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> val[3] = lower_32_bits(cycles);
> }
>
> +static u64 *kvm_fw_reg_to_bmap(struct kvm *kvm, u64 fw_reg)
> +{
> + /* No firmware registers supporting hvc bitmaps exits yet */
> + return NULL;
> +}
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> {
> u32 func_id = smccc_get_function(vcpu);
> @@ -234,15 +240,71 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> return 0;
> }
>
> +static void kvm_fw_regs_sanitize(struct kvm *kvm, struct hvc_reg_desc *hvc_desc)
> +{
> + unsigned int i;
> + u64 *hc_bmap = NULL;
> +
> + mutex_lock(&kvm->lock);
> +
> + if (hvc_desc->write_attempted)
> + goto out;
> +
> + hvc_desc->write_attempted = true;
> +
> + for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
> + hc_bmap = kvm_fw_reg_to_bmap(kvm, fw_reg_ids[i]);
> + if (hc_bmap)
> + *hc_bmap = 0;
> + }

Maybe instead of checking for feature bitmap registers in the full range
of FW registers, you could separately track a list of feature bitmap
regs and just iterate over that.

You could then just stash an array/substructure of feature bitmap reg
values in struct kvm_arch, along with a bitmap of which regs were
touched by the VMM.

For the first vCPU in KVM_RUN, zero out the FW feature regs that were
never written to. You could then punt the clobber operation and do it
exactly once for a VM.

> +out:
> + mutex_unlock(&kvm->lock);
> +}
> +
> +static bool
> +kvm_fw_regs_block_write(struct kvm *kvm, struct hvc_reg_desc *hvc_desc, u64 val)
> +{
> + bool ret = false;
> + unsigned int i;
> + u64 *hc_bmap = NULL;
> +
> + mutex_lock(&kvm->lock);
> +
> + for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
> + hc_bmap = kvm_fw_reg_to_bmap(kvm, fw_reg_ids[i]);
> + if (hc_bmap)
> + break;
> + }
> +
> + if (!hc_bmap)
> + goto out;
> +
> + /* Do not allow any updates if the VM has already started */
> + if (hvc_desc->write_disabled && val != *hc_bmap)
> + ret = true;
> +
> +out:
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
> +
> int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> void __user *uaddr = (void __user *)(long)reg->addr;
> + struct kvm *kvm = vcpu->kvm;
> + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> u64 val;
> int wa_level;
>
> if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
> return -EFAULT;
>
> + if (kvm_fw_regs_block_write(kvm, hvc_desc, val))
> + return -EBUSY;
> +
> + kvm_fw_regs_sanitize(kvm, hvc_desc);
> +
> switch (reg->id) {
> case KVM_REG_ARM_PSCI_VERSION:
> return kvm_arm_set_psci_fw_reg(vcpu, val);
> --
> 2.33.1.1089.g2158813163f-goog
>