Re: [PATCH v9 13/39] KVM: arm64: Manage GCS registers for guests

From: Marc Zyngier
Date: Wed Jul 10 2024 - 14:28:25 EST


On Wed, 10 Jul 2024 18:16:46 +0100,
Mark Brown <broonie@xxxxxxxxxx> wrote:
>
> [1 <text/plain; us-ascii (7bit)>]
> On Wed, Jul 10, 2024 at 04:17:02PM +0100, Marc Zyngier wrote:
> > Mark Brown <broonie@xxxxxxxxxx> wrote:
>
> > > + if (ctxt_has_gcs(ctxt)) {
>
> > Since this is conditioned on S1PIE, it should be only be evaluated
> > when PIE is enabled in the guest.
>
> So make ctxt_has_gcs() embed a check of ctxt_has_s1pie()?

No. I mean nest the whole thing *under* the check for S1PIE.

>
> > > + ctxt_sys_reg(ctxt, GCSPR_EL1) = read_sysreg_el1(SYS_GCSPR);
> > > + ctxt_sys_reg(ctxt, GCSCR_EL1) = read_sysreg_el1(SYS_GCSCR);
> > > + ctxt_sys_reg(ctxt, GCSCRE0_EL1) = read_sysreg_s(SYS_GCSCRE0_EL1);
>
> > Why is this part of the EL1 context? It clearly only matters to EL0
> > execution, so it could be switched in load/put on nVHE as well. And
> > actually, given that the whole thing is strictly for userspace, why do
> > we switch *anything* eagerly at all?
>
> GCS can also be used independently at EL1 (and EL2 for that matter),
> it's not purely for userspace even though this series only implements
> use in userspace. GCSPR_EL1 and GCSCR_EL1 control the use of GCS at
> EL1, not EL0, and the guest might be using GCS at EL1 even if the host
> doesn't.
>
> GCSCRE0_EL1 is for EL0 though, it ended up here mainly because it's an
> _EL1 register and we are already context switching PIRE0_EL1 in the EL1
> functions so it seemed consistent to follow the same approach for GCS.
> The _el1 and _user save/restore functions are called from the same place
> for both VHE and nVHE so the practical impact of the placement should be
> minimal AFAICT. Unlike PIRE0_EL1 GCSCRE0_EL1 only has an impact for
> code runnning at EL0 so I can move it to the _user functions.

Exactly. That's where it belongs, because we never execute EL0 while a
vcpu is loaded. On the contrary, we can make use of a uaccess helper
while a vcpu is loaded, and that makes a hell of a difference.

And it makes a difference because it would allow the loading of
EL0-specific context differently. We had this at some point, and it
was a reasonable optimisation that we lost. I'm keen on bringing it
back.

>
> TBH I'm not following your comments about switching eagerly too here at
> all, where would you expect to see the switching done? You've said
> something along these lines before which prompted me to send a patch to
> only save the S1PIE registers if they'd been written to which you were
> quite reasonably not happy with given the extra traps it would cause:
>
> https://lore.kernel.org/r/20240301-kvm-arm64-defer-regs-v1-1-401e3de92e97@xxxxxxxxxx
>
> but I'm at a loss as to how to make things less eager otherwise.
>
> > > @@ -2306,7 +2323,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > > ID_AA64PFR0_EL1_GIC |
> > > ID_AA64PFR0_EL1_AdvSIMD |
> > > ID_AA64PFR0_EL1_FP), },
> > > - ID_SANITISED(ID_AA64PFR1_EL1),
> > > + ID_WRITABLE(ID_AA64PFR1_EL1, ~(ID_AA64PFR1_EL1_RES0 |
> > > + ID_AA64PFR1_EL1_BT)),
>
> > I don't know what you're trying to do here, but that's not right. If
> > you want to make this register writable, here's the shopping list:
>
> > https://lore.kernel.org/all/87ikxsi0v9.wl-maz@xxxxxxxxxx/
>
> Yes, trying to make things writable. I do see we need to exclude more
> bits there and I'm not clear why I excluded BTI, looks like I forgot to
> add a TODO comment at some point and finish that off. Sorry about that.
>
> In the linked mail you say you want to see all fields explicitly
> handled, could you be more direct about what such explicit handling

This emails enumerate, point after point, everything that needs to be
done. I really cannot be clearer or more direct. This email is the
clearer I can be, short of writing the code myself. And I have decided
not to do it for once, unless I really need to. And as it turns out, I
don't.

> would look like? I see a number of examples in the existing code like:
>
> ID_WRITABLE(ID_AA64ZFR0_EL1, ~ID_AA64ZFR0_EL1_RES0),

This is clear: Everything is writable, and there are no bits here that
are otherwise conditional or unsupported.

>
> ID_WRITABLE(ID_AA64ISAR0_EL1, ~ID_AA64ISAR0_EL1_RES0),

Same thing.

> ID_WRITABLE(ID_AA64ISAR1_EL1, ~(ID_AA64ISAR1_EL1_GPI |
> ID_AA64ISAR1_EL1_GPA |
> ID_AA64ISAR1_EL1_API |
> ID_AA64ISAR1_EL1_APA)),

This one needs fixing because of LS64, and I have an in-progress
series for it.

> which look to my eye very similar to the above, they do not visibliy
> explictly enumerate every field in the registers and given that there's
> a single mask specified it's not clear how that would look. If
> ID_WRITABLE() took separate read/write masks and combined them it'd be
> more obvious but it's just not written that way.

I don't really see what it would buy us, but never mind.

M.

--
Without deviation from the norm, progress is not possible.