Re: [PATCH 1/1] KVM: arm64: Affinity level 3 support

From: Wei-Lin Chang
Date: Mon Feb 26 2024 - 21:27:25 EST


On Sun, 25 Feb 2024 11:19:05 +0000,
Marc Zyngier <maz@xxxxxxxxxx> wrote:
>
> [+Eric who was looking into something related recently]
>
> Hi Wei-Lin,
>
> Thanks for looking into this.
>
> On Sun, 25 Feb 2024 09:02:37 +0000,
> Wei-Lin Chang <r09922117@xxxxxxxxxxxxxxx> wrote:
> >
> > Currently, KVM ARM64 avoids using the Aff3 field for VCPUs, which saves
> > us from having to check for hardware support in ICH_VTR_EL2.A3V or the
>
> That's not strictly true. We do check for A3V support at restore time.
>
> > guest's execution state. However a VCPU could still have its Aff3 bits
> > set to non-zero if the VMM directly changes the VCPU's MPIDR_EL1. This
> > causes a mismatch between MPIDR_EL1.Aff3 and GICR_TYPER[63:56] since 0s
> > are always returned for the latter, failing the GIC Redistributor
> > matching in the VM.
> >
> > Let's fix this by only allowing userspace to write into the Aff3 field
> > of MPIDR_EL1 if Aff3 is valid.
>
> What does "valid" means here? 0 *is* a valid value. Or do you mean a
> non-zero value? Also, this now creates a dependency between GICR_TYPER
> and MPIDR_EL1. How is userspace supposed to order those when restoring
> a VM?
>
> > Additionally, extend reset_mpidr and
> > vgic_mmio_{read,write}_irouter to fully support Aff3. With theses
> > changes, GICR_TYPER can then safely return all four affinity levels.
> >
> > Suggested-by: Saurav Sachidanand <sauravsc@xxxxxxxxxx>
> > Signed-off-by: Wei-Lin Chang <r09922117@xxxxxxxxxxxxxxx>
> > ---
> > arch/arm64/kvm/sys_regs.c | 24 +++++++++++++++++++++---
> > arch/arm64/kvm/vgic/vgic-debug.c | 2 +-
> > arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 +++++++++++-------
> > include/kvm/arm_vgic.h | 7 ++++++-
> > 4 files changed, 39 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 30253bd199..6694ce851a 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -239,6 +239,19 @@ static u8 get_min_cache_line_size(bool icache)
> > return field + 2;
> > }
> >
> > +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> > + u64 val)
> > +{
> > + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
>
> What does this mean for a guest that doesn't have a GICv3?
>
> > +
> > + if (!aff3_valid)
> > + val &= ~((u64)MPIDR_LEVEL_MASK << MPIDR_LEVEL_SHIFT(3));
> > +
> > + __vcpu_sys_reg(vcpu, rd->reg) = val;
> > +
> > + return 0;
> > +}
> > +
> > /* Which cache CCSIDR represents depends on CSSELR value. */
> > static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> > {
> > @@ -817,10 +830,12 @@ static u64 reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > {
> > u64 mpidr;
> > + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
>
> Same thing.
>
> >
> > /*
> > - * Map the vcpu_id into the first three affinity level fields of
> > - * the MPIDR. We limit the number of VCPUs in level 0 due to a
> > + * Map the vcpu_id into the affinity level fields of the MPIDR. The
> > + * fourth level is mapped only if we are running a 64 bit guest and
> > + * A3V is supported. We limit the number of VCPUs in level 0 due to a
> > * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> > * of the GICv3 to be able to address each CPU directly when
> > * sending IPIs.
> > @@ -828,6 +843,8 @@ static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> > mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> > mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > + if (aff3_valid)
> > + mpidr |= (u64)((vcpu->vcpu_id >> 20) & 0xff) << MPIDR_LEVEL_SHIFT(3);
>
> From virt/kcvm/kvm_main.c:
>
> static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> {
> int r;
> struct kvm_vcpu *vcpu;
> struct page *page;
>
> if (id >= KVM_MAX_VCPU_IDS)
> return -EINVAL;
>
> [...]
> }
>
> So vcpu_id is capped at KVM_MAX_VCPU_IDS, which is 512 on arm64. How
> does this ever produce anything other than 0? This is, by the way,
> already true for Aff2. Which is why I have always found this change
> extremely questionable: why do you need to describe 2^32 CPUs when you
> can only create 512?
>
> > mpidr |= (1ULL << 31);
> > vcpu_write_sys_reg(vcpu, mpidr, MPIDR_EL1);
> >
> > @@ -2232,7 +2249,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >
> > { SYS_DESC(SYS_DBGVCR32_EL2), trap_undef, reset_val, DBGVCR32_EL2, 0 },
> >
> > - { SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
> > + { SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1,
> > + .get_user = NULL, .set_user = set_mpidr },
> >
> > /*
> > * ID regs: all ID_SANITISED() entries here must have corresponding
> > diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
> > index 85606a531d..726cf1bd7b 100644
> > --- a/arch/arm64/kvm/vgic/vgic-debug.c
> > +++ b/arch/arm64/kvm/vgic/vgic-debug.c
> > @@ -206,7 +206,7 @@ static void print_irq_state(struct seq_file *s, struct vgic_irq *irq,
> > " %2d "
> > "%d%d%d%d%d%d%d "
> > "%8d "
> > - "%8x "
> > + "%8llx "
> > " %2x "
> > "%3d "
> > " %2d "
> > diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > index c15ee1df03..ea0d4ad85a 100644
> > --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > @@ -195,13 +195,13 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
> > {
> > int intid = VGIC_ADDR_TO_INTID(addr, 64);
> > struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> > + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
>
> Hint: if you need to write the same expression more than once, you
> probably need a helper for it. Meaning that you will only have to fix
> it once.
>
> > unsigned long ret = 0;
> >
> > if (!irq)
> > return 0;
> >
> > - /* The upper word is RAZ for us. */
> > - if (!(addr & 4))
> > + if (aff3_valid || !(addr & 4))
> > ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
> >
> > vgic_put_irq(vcpu->kvm, irq);
> > @@ -213,11 +213,12 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> > unsigned long val)
> > {
> > int intid = VGIC_ADDR_TO_INTID(addr, 64);
> > + bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
> > struct vgic_irq *irq;
> > unsigned long flags;
> >
> > - /* The upper word is WI for us since we don't implement Aff3. */
> > - if (addr & 4)
> > + /* The upper word is WI if Aff3 is not valid. */
> > + if (!aff3_valid && addr & 4)
> > return;
> >
> > irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> > @@ -227,8 +228,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> >
> > raw_spin_lock_irqsave(&irq->irq_lock, flags);
> >
> > - /* We only care about and preserve Aff0, Aff1 and Aff2. */
> > - irq->mpidr = val & GENMASK(23, 0);
> > + irq->mpidr = val & MPIDR_HWID_BITMASK;
> > irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
> >
> > raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> > @@ -323,7 +323,11 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
> > int target_vcpu_id = vcpu->vcpu_id;
> > u64 value;
> >
> > - value = (u64)(mpidr & GENMASK(23, 0)) << 32;
> > + value = MPIDR_AFFINITY_LEVEL(mpidr, 3) << 56 |
> > + MPIDR_AFFINITY_LEVEL(mpidr, 2) << 48 |
> > + MPIDR_AFFINITY_LEVEL(mpidr, 1) << 40 |
> > + MPIDR_AFFINITY_LEVEL(mpidr, 0) << 32;
>
> Maybe it is time to describe these shifts in an include file, and use
> FIELD_PREP() to construct the whole thing. It will be a lot more
> readable.
>
> > +
> > value |= ((target_vcpu_id & 0xffff) << 8);
> >
> > if (vgic_has_its(vcpu->kvm))
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index 8cc38e836f..b464ac1b79 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -143,7 +143,7 @@ struct vgic_irq {
> > unsigned int host_irq; /* linux irq corresponding to hwintid */
> > union {
> > u8 targets; /* GICv2 target VCPUs mask */
> > - u32 mpidr; /* GICv3 target VCPU */
> > + u64 mpidr; /* GICv3 target VCPU */
>
> Do we really need to grow each interrupt object by 4 bytes, specially
> when we at most use 4 bytes? I'd rather we store the affinity in a
> compact way and change the way we compare it to the vcpu's MPIDR_EL1.
>
> > };
> > u8 source; /* GICv2 SGIs only */
> > u8 active_source; /* GICv2 SGIs only */
> > @@ -413,6 +413,11 @@ static inline int kvm_vgic_get_max_vcpus(void)
> > return kvm_vgic_global_state.max_gic_vcpus;
> > }
> >
> > +static inline bool kvm_vgic_has_a3v(void)
> > +{
> > + return kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_A3V_MASK;
> > +}
> > +
>
> I can see multiple problems with this:
>
> - this is the host state, which shouldn't necessarily represent the
> guest state. It should be possible to restore a VM that have a
> different A3V value and still have the same guarantees. There is
> however a small nit around ICV_CTLR_EL1.A3V, which would require
> trapping to emulate the A3V bit.
>
> - this assumes GICv3, which is definitely not universal (we support
> GICv2, for which no such restriction actually exists).
>
> Finally, I don't see VM save/restore being addressed here, and I
> suspect it hasn't been looked at.
>
> Overall, this patch does too many things, and it should be split in
> discrete changes. I also want to see an actual justification for Aff3
> support. And if we introduce it, it must be fully virtualised
> (independent of the A3V support on the host).

Hi Marc,

Really appreciate for the feedback. I think I understand most of your
comments and agree with them. It appears that I don't fully understand
the changes that I am doing with this. Thanks for explaining.

Cheers,
Wei-Lin Chang