Re: [PATCH 1/1] KVM: arm64: Affinity level 3 support

From: Marc Zyngier
Date: Sun Feb 25 2024 - 06:19:19 EST


[+Eric who was looking into something related recently]

Hi Wei-Lin,

Thanks for looking into this.

On Sun, 25 Feb 2024 09:02:37 +0000,
Wei-Lin Chang <r09922117@xxxxxxxxxxxxxxx> wrote:
>
> Currently, KVM ARM64 avoids using the Aff3 field for VCPUs, which saves
> us from having to check for hardware support in ICH_VTR_EL2.A3V or the

That's not strictly true. We do check for A3V support at restore time.

> guest's execution state. However a VCPU could still have its Aff3 bits
> set to non-zero if the VMM directly changes the VCPU's MPIDR_EL1. This
> causes a mismatch between MPIDR_EL1.Aff3 and GICR_TYPER[63:56] since 0s
> are always returned for the latter, failing the GIC Redistributor
> matching in the VM.
>
> Let's fix this by only allowing userspace to write into the Aff3 field
> of MPIDR_EL1 if Aff3 is valid.

What does "valid" means here? 0 *is* a valid value. Or do you mean a
non-zero value? Also, this now creates a dependency between GICR_TYPER
and MPIDR_EL1. How is userspace supposed to order those when restoring
a VM?

> Additionally, extend reset_mpidr and
> vgic_mmio_{read,write}_irouter to fully support Aff3. With theses
> changes, GICR_TYPER can then safely return all four affinity levels.
>
> Suggested-by: Saurav Sachidanand <sauravsc@xxxxxxxxxx>
> Signed-off-by: Wei-Lin Chang <r09922117@xxxxxxxxxxxxxxx>
> ---
> arch/arm64/kvm/sys_regs.c | 24 +++++++++++++++++++++---
> arch/arm64/kvm/vgic/vgic-debug.c | 2 +-
> arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 +++++++++++-------
> include/kvm/arm_vgic.h | 7 ++++++-
> 4 files changed, 39 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 30253bd199..6694ce851a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -239,6 +239,19 @@ static u8 get_min_cache_line_size(bool icache)
> return field + 2;
> }
>
> +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + u64 val)
> +{
> + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();

What does this mean for a guest that doesn't have a GICv3?

> +
> + if (!aff3_valid)
> + val &= ~((u64)MPIDR_LEVEL_MASK << MPIDR_LEVEL_SHIFT(3));
> +
> + __vcpu_sys_reg(vcpu, rd->reg) = val;
> +
> + return 0;
> +}
> +
> /* Which cache CCSIDR represents depends on CSSELR value. */
> static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> {
> @@ -817,10 +830,12 @@ static u64 reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> {
> u64 mpidr;
> + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();

Same thing.

>
> /*
> - * Map the vcpu_id into the first three affinity level fields of
> - * the MPIDR. We limit the number of VCPUs in level 0 due to a
> + * Map the vcpu_id into the affinity level fields of the MPIDR. The
> + * fourth level is mapped only if we are running a 64 bit guest and
> + * A3V is supported. We limit the number of VCPUs in level 0 due to a
> * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> * of the GICv3 to be able to address each CPU directly when
> * sending IPIs.
> @@ -828,6 +843,8 @@ static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> + if (aff3_valid)
> + mpidr |= (u64)((vcpu->vcpu_id >> 20) & 0xff) << MPIDR_LEVEL_SHIFT(3);

From virt/kcvm/kvm_main.c:

static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
{
int r;
struct kvm_vcpu *vcpu;
struct page *page;

if (id >= KVM_MAX_VCPU_IDS)
return -EINVAL;

[...]
}

So vcpu_id is capped at KVM_MAX_VCPU_IDS, which is 512 on arm64. How
does this ever produce anything other than 0? This is, by the way,
already true for Aff2. Which is why I have always found this change
extremely questionable: why do you need to describe 2^32 CPUs when you
can only create 512?

> mpidr |= (1ULL << 31);
> vcpu_write_sys_reg(vcpu, mpidr, MPIDR_EL1);
>
> @@ -2232,7 +2249,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>
> { SYS_DESC(SYS_DBGVCR32_EL2), trap_undef, reset_val, DBGVCR32_EL2, 0 },
>
> - { SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
> + { SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1,
> + .get_user = NULL, .set_user = set_mpidr },
>
> /*
> * ID regs: all ID_SANITISED() entries here must have corresponding
> diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
> index 85606a531d..726cf1bd7b 100644
> --- a/arch/arm64/kvm/vgic/vgic-debug.c
> +++ b/arch/arm64/kvm/vgic/vgic-debug.c
> @@ -206,7 +206,7 @@ static void print_irq_state(struct seq_file *s, struct vgic_irq *irq,
> " %2d "
> "%d%d%d%d%d%d%d "
> "%8d "
> - "%8x "
> + "%8llx "
> " %2x "
> "%3d "
> " %2d "
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index c15ee1df03..ea0d4ad85a 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -195,13 +195,13 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
> {
> int intid = VGIC_ADDR_TO_INTID(addr, 64);
> struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();

Hint: if you need to write the same expression more than once, you
probably need a helper for it. Meaning that you will only have to fix
it once.

> unsigned long ret = 0;
>
> if (!irq)
> return 0;
>
> - /* The upper word is RAZ for us. */
> - if (!(addr & 4))
> + if (aff3_valid || !(addr & 4))
> ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>
> vgic_put_irq(vcpu->kvm, irq);
> @@ -213,11 +213,12 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> unsigned long val)
> {
> int intid = VGIC_ADDR_TO_INTID(addr, 64);
> + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
> struct vgic_irq *irq;
> unsigned long flags;
>
> - /* The upper word is WI for us since we don't implement Aff3. */
> - if (addr & 4)
> + /* The upper word is WI if Aff3 is not valid. */
> + if (!aff3_valid && addr & 4)
> return;
>
> irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> @@ -227,8 +228,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>
> raw_spin_lock_irqsave(&irq->irq_lock, flags);
>
> - /* We only care about and preserve Aff0, Aff1 and Aff2. */
> - irq->mpidr = val & GENMASK(23, 0);
> + irq->mpidr = val & MPIDR_HWID_BITMASK;
> irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>
> raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> @@ -323,7 +323,11 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
> int target_vcpu_id = vcpu->vcpu_id;
> u64 value;
>
> - value = (u64)(mpidr & GENMASK(23, 0)) << 32;
> + value = MPIDR_AFFINITY_LEVEL(mpidr, 3) << 56 |
> + MPIDR_AFFINITY_LEVEL(mpidr, 2) << 48 |
> + MPIDR_AFFINITY_LEVEL(mpidr, 1) << 40 |
> + MPIDR_AFFINITY_LEVEL(mpidr, 0) << 32;

Maybe it is time to describe these shifts in an include file, and use
FIELD_PREP() to construct the whole thing. It will be a lot more
readable.

> +
> value |= ((target_vcpu_id & 0xffff) << 8);
>
> if (vgic_has_its(vcpu->kvm))
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 8cc38e836f..b464ac1b79 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -143,7 +143,7 @@ struct vgic_irq {
> unsigned int host_irq; /* linux irq corresponding to hwintid */
> union {
> u8 targets; /* GICv2 target VCPUs mask */
> - u32 mpidr; /* GICv3 target VCPU */
> + u64 mpidr; /* GICv3 target VCPU */

Do we really need to grow each interrupt object by 4 bytes, specially
when we at most use 4 bytes? I'd rather we store the affinity in a
compact way and change the way we compare it to the vcpu's MPIDR_EL1.

> };
> u8 source; /* GICv2 SGIs only */
> u8 active_source; /* GICv2 SGIs only */
> @@ -413,6 +413,11 @@ static inline int kvm_vgic_get_max_vcpus(void)
> return kvm_vgic_global_state.max_gic_vcpus;
> }
>
> +static inline bool kvm_vgic_has_a3v(void)
> +{
> + return kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_A3V_MASK;
> +}
> +

I can see multiple problems with this:

- this is the host state, which shouldn't necessarily represent the
guest state. It should be possible to restore a VM that have a
different A3V value and still have the same guarantees. There is
however a small nit around ICV_CTLR_EL1.A3V, which would require
trapping to emulate the A3V bit.

- this assumes GICv3, which is definitely not universal (we support
GICv2, for which no such restriction actually exists).

Finally, I don't see VM save/restore being addressed here, and I
suspect it hasn't been looked at.

Overall, this patch does too many things, and it should be split in
discrete changes. I also want to see an actual justification for Aff3
support. And if we introduce it, it must be fully virtualised
(independent of the A3V support on the host).

Thanks,

M.

--
Without deviation from the norm, progress is not possible.