Re: [PATCH v1 3/5] arm64: KVM: Add interface to set guest value for TRFCR register

From: James Clark
Date: Fri Sep 29 2023 - 08:09:51 EST




On 29/09/2023 11:17, Marc Zyngier wrote:
> On Thu, 28 Sep 2023 16:16:07 +0100,
> James Clark <james.clark@xxxxxxx> wrote:
>>
>> Add an interface for the Coresight driver to use to set the value of the
>> TRFCR register for the guest. This register controls the exclude
>> settings for trace at different exception levels, and is used to
>> honor the exclude_host and exclude_guest parameters from the Perf
>> session. This will be used to later write TRFCR_EL1 on nVHE at guest
>> switch. For VHE, TRFCR_EL1 is written immediately. Because guest writes
>> to the register are trapped, the value will persist and can't be
>> modified.
>>
>> The settings must be copied to the vCPU before each run in the same
>> way that PMU events are because the per-cpu struct isn't accessible in
>> protected mode.
>>
>> Now that both guest and host values are saved, rename trfcr_el1 to
>> host_trfcr_el1 to make it clear that's the value that should be restored
>> on return to the host.
>>
>> Reviewed-by: Mark Brown <broonie@xxxxxxxxxx> (sysreg)
>> Signed-off-by: James Clark <james.clark@xxxxxxx>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 6 +++++-
>> arch/arm64/kvm/arm.c | 1 +
>> arch/arm64/kvm/debug.c | 21 +++++++++++++++++++++
>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 4 ++--
>> arch/arm64/tools/sysreg | 4 ++++
>> 5 files changed, 33 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 498f922f4f41..0e57827a0cf2 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -547,7 +547,8 @@ struct kvm_vcpu_arch {
>> /* Statistical profiling extension */
>> u64 pmscr_el1;
>> /* Self-hosted trace */
>> - u64 trfcr_el1;
>> + u64 host_trfcr_el1;
>> + u64 guest_trfcr_el1;
>> } host_debug_state;
>
> I think it is high time we stop having *guest* state in a structure
> that is obviously for the host, starting by moving the breakpoint and
> watchpoint out, and into the sysreg array.
>

It looks like host_debug_state::regs is actually the host debug state,
despite the type being called "struct kvm_guest_debug_arch".

But yeah I can move pmscr_el1 and trfcr_el1 to the guest and host sysreg
arrays, that would make sense.

host_debug_state::regs seems to be a bit more intertwined with the
kvm_arch_vcpu_ioctl_set_guest_debug() ioctl and the pattern to support
two different sets of guest debug state, so I'm not sure if it's worth
touching that at this point:

* We maintain more than a single set of debug registers to support
* debugging the guest
...
* debug_ptr points to the set of debug registers that should be loaded
* onto the hardware when running the guest.
*/
struct kvm_guest_debug_arch *debug_ptr;
struct kvm_guest_debug_arch vcpu_debug_state;
struct kvm_guest_debug_arch external_debug_state;

> And then TRFCR_EL1 can join the fun. But it is pretty unclear whether
> that's actually the guest state.
>

It has an effect on the guest, but can never actually be read or written
by the guest. Not sure if that pattern exists elsewhere...

>>
>> /* VGIC state */
>> @@ -1097,6 +1098,8 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
>> void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
>> void kvm_clr_pmu_events(u32 clr);
>> bool kvm_set_pmuserenr(u64 val);
>> +void kvm_etm_set_guest_trfcr(u64 trfcr_guest);
>> +void kvm_etm_update_vcpu_events(struct kvm_vcpu *vcpu);
>> #else
>> static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
>> static inline void kvm_clr_pmu_events(u32 clr) {}
>> @@ -1104,6 +1107,7 @@ static inline bool kvm_set_pmuserenr(u64 val)
>> {
>> return false;
>> }
>> +static inline void kvm_etm_set_guest_trfcr(u64 trfcr_guest) {}
>> #endif
>>
>> void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 1bfdd583b261..65e805dc1d7a 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -958,6 +958,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>> kvm_vgic_flush_hwstate(vcpu);
>>
>> kvm_pmu_update_vcpu_events(vcpu);
>> + kvm_etm_update_vcpu_events(vcpu);
>>
>> /*
>> * Ensure we set mode to IN_GUEST_MODE after we disable
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index 6a1bad1a921b..379d2677961f 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -22,6 +22,7 @@
>> DBG_MDSCR_MDE)
>>
>> static DEFINE_PER_CPU(u64, mdcr_el2);
>> +static DEFINE_PER_CPU(u64, guest_trfcr);
>
> Hold on a sec. Why is that global? I'd expect the tracing to be
> specific to a vcpu, and not to affect *everything*.
>

I suppose because it is a property of the host CPU rather than the vcpu.
The vcpu might not even exist yet when the Perf session starts, and Perf
sessions only care about the host CPU's perspective. It's defining the
guest trfcr value for any vcpu that might or might not run on this core
in the future, rather than any one specific vcpu.

I copied the pattern from the existing Perf PMU settings in
arch/arm64/kvm/pmu.c:

static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);

We could flip it around and make KVM say "what's the guest trfcr value
for this core?" to the Coresight driver at the point of guest switch,
but all that would mean is that the global would be in the Coresight
driver instead.

>>
>> /**
>> * save/restore_guest_debug_regs
>> @@ -342,3 +343,23 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE);
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
>> }
>> +
>> +void kvm_etm_set_guest_trfcr(u64 trfcr_guest)
>> +{
>> + if (has_vhe())
>> + write_sysreg_s(trfcr_guest, SYS_TRFCR_EL12);
>> + else
>> + *this_cpu_ptr(&guest_trfcr) = trfcr_guest;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_etm_set_guest_trfcr);
>
> In what context can this be called? What if we want to expose TRFCR to
> guests instead? It looks to me that this ultimately precludes such
> use.
>

Yes it does, but currently Coresight isn't exposed to guests at all so
it's not a problem.

In the future if we did decide to do it, there would be significant work
to somehow share the components between the host and the guest. But even
if they were shared, in the case where the guest isn't using trace the
trfcr and exclude guest settings would still need to work, so something
of this form would still be required.

>> +
>> +/*
>> + * Updates the vcpu's view of the etm events for this cpu. Must be
>> + * called before every vcpu run after disabling interrupts, to ensure
>> + * that an interrupt cannot fire and update the structure.
>> + */
>> +void kvm_etm_update_vcpu_events(struct kvm_vcpu *vcpu)
>> +{
>> + if (!has_vhe() && vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRFCR))
>> + vcpu->arch.host_debug_state.guest_trfcr_el1 = *this_cpu_ptr(&guest_trfcr);
>
> But what is the point of this per-vcpu field if all you care about is
> some per-CPU data?
>

This was the only way I could find to share data between the host and
have it available to EL2 on the guest switch. I saw it was also done
that way for PMUs in kvm_pmu_update_vcpu_events(). Maybe it could be
done by exposing the guest_trfcr global to both sides with some
annotation? But I assumed that wasn't idomatic as it wasn't done for PMUs.

>> +}
>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> index 89c208112eb7..55bc01e9808f 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> @@ -86,7 +86,7 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>> __debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
>> /* Disable and flush Self-Hosted Trace generation */
>> if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRFCR))
>> - __debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
>> + __debug_save_trace(&vcpu->arch.host_debug_state.host_trfcr_el1);
>> }
> `<>
>> void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
>> @@ -99,7 +99,7 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>> if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
>> __debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
>> if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRFCR))
>> - __debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
>> + __debug_restore_trace(vcpu->arch.host_debug_state.host_trfcr_el1);
>> }
>>
>> void __debug_switch_to_host(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
>> index 992722c0c23b..295a4a625b8b 100644
>> --- a/arch/arm64/tools/sysreg
>> +++ b/arch/arm64/tools/sysreg
>> @@ -2661,3 +2661,7 @@ EndSysreg
>> Sysreg TRFCR_EL2 3 4 1 2 1
>> Fields TRFCR_EL2
>> EndSysreg
>> +
>> +Sysreg TRFCR_EL12 3 5 1 2 1
>> +Fields TRFCR_ELx
>> +EndSysreg
>
> Please move this to the first patch.
>
> M.
>

Will do.

James