Re: [PATCH 3/4] KVM: arm64: add emulation for CTR_EL0 register

From: Marc Zyngier
Date: Sun Apr 14 2024 - 04:36:08 EST


On Sat, 13 Apr 2024 14:50:42 +0100,
Sebastian Ott <sebott@xxxxxxxxxx> wrote:
>
> On Sat, 13 Apr 2024, Marc Zyngier wrote:
>
> > On Fri, 05 Apr 2024 13:01:07 +0100,
> > Sebastian Ott <sebott@xxxxxxxxxx> wrote:
> >>
> >> CTR_EL0 is currently handled as an invariant register, thus
> >> guests will be presented with the host value of that register.
> >> Add emulation for CTR_EL0 based on a per VM value.
> >>
> >> When CTR_EL0 is changed the reset function for CLIDR_EL1 is
> >> called to make sure we present the guest with consistent
> >> register values.
> >
> > Isn't that a change in the userspace ABI? You are now creating an
> > explicit ordering between the write to CTR_EL0 and the rest of the
> > cache hierarchy registers. It has the obvious capacity to lead to the
> > wrong result in a silent way...
>
> Yea, that's why I've asked in the cover letter if userspace would be
> ok with that. I thought that this is what you suggested in your reply
> to the
> RFC. (https://lore.kernel.org/linux-arm-kernel/20240318111636.10613-5-sebott@xxxxxxxxxx/T/#m0aea5b744774f123bd9a4a4b7be6878f6c737f63)
>
> But I guess I've got that wrong.

Not wrong, just incomplete. I think it is fine to recompute the cache
topology if there is no restored cache state at the point where
CTL_EL0 is written. However, if a topology has been restored (and that
it is incompatible with the write to CTR_EL0, the write must fail. The
ugly part is that the CCSIDR array is per vcpu and not per VM.

> Do we have other means to handle the dependencies between registers?
> Allow inconsistent values and do a sanity check before the first
> vcpu_run()?

Failing on vcpu_run() is the worse kind of failure, because you can't
easily find the failure cause, other than by looking at each single
register trying to spot the inconsistency.

In the case at hand, I think validating CTL_EL0 against the currently
visible topology is the right thing to do.

>
> >>
> >> Signed-off-by: Sebastian Ott <sebott@xxxxxxxxxx>
> >> ---
> >> arch/arm64/kvm/sys_regs.c | 72 ++++++++++++++++++++++++++++++++++-----
> >> 1 file changed, 64 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >> index 4d29b1a0842d..b0ba292259f9 100644
> >> --- a/arch/arm64/kvm/sys_regs.c
> >> +++ b/arch/arm64/kvm/sys_regs.c
> >> @@ -1874,6 +1874,55 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> >> return true;
> >> }
> >>
> >> +static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
> >> +{
> >> + vcpu->kvm->arch.ctr_el0 = 0;
> >> + return kvm_get_ctr_el0(vcpu->kvm);
> >
> > I'd expect the cached value to be reset instead of being set to
> > 0. What are you achieving by this?
>
> The idea was that kvm->arch.ctr_el0 == 0 means we use the host value and
> don't set up a trap.

I'd rather you keep the shadow register to a valid value at all times,
and simply compare it to the HW-provided version to decide whether you
need to trap. The main reason is that we don't know how the
architecture will evolve, and CTR_EL0==0 may become a legal value in
the future (unlikely, but that's outside of our control).

>
> >> +}
> >> +
> >> +static int get_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> >> + u64 *val)
> >> +{
> >> + *val = kvm_get_ctr_el0(vcpu->kvm);
> >> + return 0;
> >> +}
> >> +
> >> +static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding);
> >> +
> >> +static int set_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> >> + u64 val)
> >> +{
> >> + u64 host_val = read_sanitised_ftr_reg(SYS_CTR_EL0);
> >> + u64 old_val = kvm_get_ctr_el0(vcpu->kvm);
> >> + const struct sys_reg_desc *clidr_el1;
> >> + int ret;
> >> +
> >> + if (val == old_val)
> >> + return 0;
> >> +
> >> + if (kvm_vm_has_ran_once(vcpu->kvm))
> >> + return -EBUSY;
> >> +
> >> + mutex_lock(&vcpu->kvm->arch.config_lock);
> >> + ret = arm64_check_features(vcpu, rd, val);
> >> + if (ret) {
> >> + mutex_unlock(&vcpu->kvm->arch.config_lock);
> >> + return ret;
> >> + }
> >> + if (val != host_val)
> >> + vcpu->kvm->arch.ctr_el0 = val;
> >> + else
> >> + vcpu->kvm->arch.ctr_el0 = 0;
> >> +
> >> + mutex_unlock(&vcpu->kvm->arch.config_lock);
> >> +
> >> + clidr_el1 = get_sys_reg_desc(SYS_CLIDR_EL1);
> >> + if (clidr_el1)
> >> + clidr_el1->reset(vcpu, clidr_el1);
> >> +
> >> + return 0;
> >
> > No check against what can be changed, and in what direction? You seem
> > to be allowing a guest to migrate from a host with IDC==1 to one where
> > IDC==0 (same for DIC). How can that work? Same for the cache lines,
> > which can be larger on the target... How will the guest survive that?
>
> Shouldn't this all be handled by arm64_check_features() using the safe
> value definitions from ftr_ctr? (I'll double check that..)

I think I may have read the code the wrong way. IDC/DIC should be OK
due to the feature check. I'm not sure about the line-size fields
though, and we should make sure that only a *smaller* line size is
allowed.

Then, there is the case of all the other fields. TminLine should get
the same treatment as the other cache line size fields, with the
additional constraint that it should be RES0 if the guest isn't MTE
aware. CWG and ERG are other interesting cases, and I don't think they
should be writable (your patch looks correct in that respect).

>
> >> @@ -4049,6 +4102,9 @@ void kvm_init_sysreg(struct kvm_vcpu *vcpu)
> >> vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
> >> }
> >>
> >> + if (vcpu->kvm->arch.ctr_el0)
> >> + vcpu->arch.hcr_el2 |= HCR_TID2;
> >
> > Why trap CTR_EL0 if the values are the same as the host?
>
> For values same as host vcpu->kvm->arch.ctr_el0 would be zero and
> reg access would not be trapped.
>
> > I really dislike the use of the value 0 as a such an indication.
>
> OK.
>
> > Why isn't this grouped with the traps in vcpu_reset_hcr()?
>
> I was under the impression that vcpu_reset_hcr() is called too early
> to decide if we need to set up a trap or not (but lemme double check
> that).

Could well be (it is probably decided at vpcu init time). but in that
case, it could be worth moving all the TID2/TID4 trapping together.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.