Re: [RFC PATCH 18/26] kvm: arm64: Intercept PSCI_CPU_OFF host SMC calls

From: Marc Zyngier
Date: Thu Nov 05 2020 - 06:34:20 EST


On 2020-11-04 18:36, David Brazdil wrote:
Add a handler of the CPU_OFF PSCI host SMC trapped in KVM nVHE hyp code.
When invoked, it changes the recorded state of the core to OFF before
forwarding the call to EL3. If the call fails, it changes the state back
to ON and returns the error to the host.

Signed-off-by: David Brazdil <dbrazdil@xxxxxxxxxx>
---
arch/arm64/kvm/hyp/nvhe/psci.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index c3d0a6246c66..00dc0cab860c 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -13,6 +13,8 @@
#include <kvm/arm_psci.h>
#include <uapi/linux/psci.h>

+#include <nvhe/spinlock.h>
+
/* Config options set by the host. */
u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 kvm_host_psci_function_id[PSCI_FN_MAX];
@@ -20,6 +22,7 @@ s64 hyp_physvirt_offset;

#define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset)

+static DEFINE_PER_CPU(hyp_spinlock_t, psci_cpu_lock);
DEFINE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state);

static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
@@ -76,9 +79,32 @@ static __noreturn unsigned long
psci_forward_noreturn(struct kvm_cpu_context *ho
hyp_panic(); /* unreachable */
}

+static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
+ enum kvm_nvhe_psci_state *cpu_power = this_cpu_ptr(&psci_cpu_state);
+ u32 power_state = (u32)host_ctxt->regs.regs[1];
+ int ret;
+
+ /* Change the recorded state to OFF before forwarding the call. */
+ hyp_spin_lock(cpu_lock);
+ *cpu_power = KVM_NVHE_PSCI_CPU_OFF;
+ hyp_spin_unlock(cpu_lock);

So at this point, another CPU can observe the vcpu being "off", and issue
a PSCI_ON, which may result in an "already on". I'm not sure this is an
actual issue, but it is worth documenting.

What is definitely missing is a rational about *why* we need to track the
state of the vcpus. I naively imagined that we could directly proxy the
PSCI calls to EL3, only repainting PC for PSCI_ON.

Thanks,

M.
--
Jazz is not dead. It just smells funny...