Re: [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly

From: Like Xu
Date: Thu Feb 10 2022 - 07:56:12 EST


On 10/2/2022 2:57 am, Dave Hansen wrote:
On 2/9/22 10:47, Jim Mattson wrote:
On Wed, Feb 9, 2022 at 7:41 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:

On 2/9/22 05:21, Peter Zijlstra wrote:
On Wed, Feb 02, 2022 at 02:35:45PM -0800, Jim Mattson wrote:
3) TDX is going to pull the rug out from under us anyway. When the TDX
module usurps control of the PMU, any active host counters are going
to stop counting. We are going to need a way of telling the host perf
subsystem what's happening, or other host perf clients are going to
get bogus data.
That's not acceptible behaviour. I'm all for unilaterally killing any
guest that does this.

I'm not sure where the "bogus data" comes or to what that refers
specifically. But, the host does have some level of control:

I was referring to gaps in the collection of data that the host perf
subsystem doesn't know about if ATTRIBUTES.PERFMON is set for a TDX
guest. This can potentially be a problem if someone is trying to
measure events per unit of time.

Ahh, that makes sense.

Does SGX cause problem for these people? It can create some of the same
collection gaps:

performance monitoring activities are suppressed when entering
an opt-out (of performance monitoring) enclave.


Are the end perf user aware of the collection gaps caused by the code running under SGX?

As far as I know there shouldn't be one yet, we may need a tool like "perf-kvm" for SGX enclaves.

The host VMM controls whether a guest TD can use the performance
monitoring ISA using the TD’s ATTRIBUTES.PERFMON bit...

So, worst-case, we don't need to threaten to kill guests. The host can
just deny access in the first place.

The KVM module parameter "enable_pmu" might be respected,
together with a per-TD guest user space control option.


I'm not too picky about what the PMU does, but the TDX behavior didn't
seem *that* onerous to me. The gory details are all in "On-TD
Performance Monitoring" here:

https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf

My read on it is that TDX host _can_ cede the PMU to TDX guests if it
wants. I assume the context-switching model Jim mentioned is along the
lines of what TDX is already doing on host<->guest transitions.

Right. If ATTRIBUTES.PERFMON is set, then "perfmon state is
context-switched by the Intel TDX module across TD entry and exit
transitions." Furthermore, the VMM has no access to guest perfmon
state.

Even the guest TD is under off-TD debug and is untrusted ?

I think we (host administrators) need to profile off-TD guests to locate
performance bottlenecks with a holistic view, regardless of whether the
ATTRIBUTES.PERFMON bit is cleared or not.

Perhaps shared memory could be a way to pass guests performance data
to the host if PMU activities are suppressed across TD entry and exit
transitions for the guest TD is under off-TD debug and is untrusted.


If you're saying that setting this bit is unacceptable, then perhaps
the TDX folks need to redesign their in-guest PMU support.

It's fine with *me*, but I'm not too picky about the PMU. But, it
sounded like Peter was pretty concerned about it.

One protocol I've seen is that the (TD or normal) guest cannot compromise
the host's availability to PMU resources (at least in the host runtime).

It's pretty fine and expected that performance data within the trusted TDX guest
should be logically isolated from host data (without artificial aggregation).


In any case, if we (Linux folks) need a change, it's *possible* because
most of this policy is implemented in software in the TDX module. It
would just be painful for the folks who came up with the existing mechanism.


When the code to enable ATTRIBUTES.PERFMON appears in the mailing list,
we can have more discussions in a very good time window.