Re: [PATCH v2] KVM: x86: PMU Event Filter

From: Wei Wang
Date: Tue Jul 16 2019 - 04:44:24 EST

On 07/16/2019 08:10 AM, Eric Hankland wrote:
I think just disabling guest cpuid might not be enough, since guest
could write to the msr without checking the cpuid.

Why not just add a bitmap for fixed counter?
e.g. fixed_counter_reject_bitmap

At the beginning of reprogram_fixed_counter, we could add the check:

if (test_bit(idx, &kvm->arch.fixed_counter_reject_bitmap))
return -EACCES;

(Please test with your old guest and see if they have issues if we
inject #GP when
they try to set the fixed_ctrl msr. If there is, we could drop -EACCESS

The bitmap could be set at kvm_vm_ioctl_set_pmu_event_filter.
intel_pmu_refresh() checks the guest cpuid and sets the number of
fixed counters according to that:
pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,

and reprogram_fixed_counters()/get_fixed_pmc() respect this so the
guest can't just ignore the cpuid.

Yes, but as you noticed, we couldn't disable fixed counter 2 while keeping
counter 3 running via that.

Adding a bitmap does let you do things like disable the first counter
but keep the second and third, but given that there are only three and
the events are likely to be on a whitelist anyway, it seemed like
adding the bitmap wasn't worth it. If you still feel the same way even
though we can disable them via the cpuid, I can add this in.

We need the design to be architecturally clean. For example, if the hardware later
comes up with fixed counter 5 and 6.
5 is something we really want to expose to the guest to use while 6 isn't,
can our design here (further) support that?

We don't want to re-design this at that time. However, extending what we have would
be acceptable. So, if you hesitate to add the bitmap method that I described, please
add GP tags to the ACTIONs defined, e.g.

enum kvm_pmu_action_type

and add comments to explain something like below:

Those GP actions are for the filtering of guest events running on the virtual general
purpose counters. The actions to filter guest events running on the virtual fixed
function counters are not added currently as they all seem fine to be used by the
guest so far, but that could be supported on demand in the future via adding new

I think it would be better to add more, please see below:

enum kvm_pmu_action_type {

and do a check in kvm_vm_ioctl_set_pmu_event_filter()
if (filter->action >= KVM_PMU_EVENT_ACTION_MAX)
return -EINVAL;

This is for detecting the case that we add a new action in
userspace, while the kvm hasn't been updated to support that.

KVM_PMU_EVENT_ACTION_NONE is for userspace to remove
the filter after they set it.
We can achieve the same result by using a reject action with an empty
set of events - is there some advantage to "none" over that? I can add
that check for valid actions.

Yes, we could also make it work via passing nevents=0.

I slightly prefer the use of the NONE action here. The advantage is
simpler (less) userspace command. For QEMU, people could
disable the filter list via qmp command, e.g. pmu-filter=none,
instead of pmu-filter=accept,nevents=0.
It also seems more straightforward when people read the usage
manual - a NONE command to cancel the filtering, instead of
"first setting this, then setting that.."
I don't see advantages of using nevents over NONE action.

Anyway, this one isn't a critical issue, and it's up to you here.

Why is this limit needed?
Serves to keep the filters on the smaller side and ensures the size
calculation can't overflow if users attempt to. Keeping the filter
under 4k is nicer for allocation - also, if we want really large
filters we might want to do something smarter than a linear traversal
of the filter when guests program counters.

I think 63 is too small, and it looks like a random number being put here.
From the SDM table 19-3, it seems there are roughly 300 events. Functionally,
the design should support to filter most of them.
(optimization with smarter traversal is another story that could be done later)

#define KVM_PMU_EVENT_FILTER_MAX_EVENTS (PAGE_SIZE - sizeof(struct kvm_pmu_event_filter)) / sizeof (__u64)

and please add some comments above about the consideration that we set this number.

+ kvfree(filter);
Probably better to have it conditionally?

if (filter) {

You may want to factor it out, so that kvm_pmu_destroy could reuse.
Do you mean kvm_arch_destroy_vm? It looks like that's where kvm_arch
members are freed. I can do that.

Sounds good.