Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

From: Manali Shukla
Date: Tue Oct 03 2023 - 13:32:25 EST


On 10/3/2023 1:46 PM, Peter Zijlstra wrote:
> On Mon, Oct 02, 2023 at 05:56:28PM -0700, Sean Christopherson wrote:
>> On Mon, Oct 02, 2023, Peter Zijlstra wrote:
>
>>> I'm not sure what you're suggesting here. It will have to save/restore
>>> all those MSRs anyway. Suppose it switches between vCPUs.
>>
>> The "when" is what's important. If KVM took a literal interpretation of
>> "exclude guest" for pass-through MSRs, then KVM would context switch all those
>> MSRs twice for every VM-Exit=>VM-Enter roundtrip, even when the VM-Exit isn't a
>> reschedule IRQ to schedule in a different task (or vCPU). The overhead to save
>> all the host/guest MSRs and load all of the guest/host MSRs *twice* for every
>> VM-Exit would be a non-starter. E.g. simple VM-Exits are completely handled in
>> <1500 cycles, and "fastpath" exits are something like half that. Switching all
>> the MSRs is likely 1000+ cycles, if not double that.
>
> See, you're the virt-nerd and I'm sure you know what you're talking
> about, but I have no clue :-) I didn't know there were different levels
> of vm-exit.
>
>> FWIW, the primary use case we care about is for slice-of-hardware VMs, where each
>> vCPU is pinned 1:1 with a host pCPU.
>
> I've been given to understand that vm-exit is a bad word in this
> scenario, any exit is a fail. They get MWAIT and all the other crap and
> more or less pretend to be real hardware.
>
> So why do you care about those MSRs so much? That should 'never' happen
> in this scenario.
>
>>>> Or at least, that was my reading of things. Maybe it was just a
>>>> misunderstanding because we didn't do a good job of defining the behavior.
>>>
>>> This might be the case. I don't particularly care where the guest
>>> boundary lies -- somewhere in the vCPU thread. Once the thread is gone,
>>> PMU is usable again etc..
>>
>> Well drat, that there would have saved a wee bit of frustration. Better late
>> than never though, that's for sure.
>>
>> Just to double confirm: keeping guest PMU state loaded until the vCPU is scheduled
>> out or KVM exits to userspace, would mean that host perf events won't be active
>> for potentially large swaths of non-KVM code. Any function calls or event/exception
>> handlers that occur within the context of ioctl(KVM_RUN) would run with host
>> perf events disabled.
>
> Hurmph, that sounds sub-optimal, earlier you said <1500 cycles, this all
> sounds like a ton more.
>
> /me frobs around the kvm code some...
>
> Are we talking about exit_fastpath loop in vcpu_enter_guest() ? That
> seems to run with IRQs disabled, so at most you can trigger a #PF or
> something, which will then trip an exception fixup because you can't run
> #PF with IRQs disabled etc..
>
> That seems fine. That is, a theoretical kvm_x86_handle_enter_irqoff()
> coupled with the existing kvm_x86_handle_exit_irqoff() seems like
> reasonable solution from where I'm sitting. That also more or less
> matches the FPU state save/restore AFAICT.
>
> Or are you talking about the whole of vcpu_run() ? That seems like a
> massive amount of code, and doesn't look like anything I'd call a
> fast-path. Also, much of that loop has preemption enabled...
>
>> Are you ok with that approach? Assuming we don't completely botch things, the
>> interfaces are sane, we can come up with a clean solution for handling NMIs, etc.
>
> Since you steal the whole PMU, can't you re-route the PMI to something
> that's virt friendly too?
>
>>> It also means ::exclude_guest should actually work -- it often does not
>>> today -- the IBS thing for example totally ignores it.
>>
>> Is that already an in-tree, or are you talking about Manali's proposed series to
>> support virtualizing IBS?
>
> The IBS code as is, it totally ignores ::exclude_guest. Manali was going
> to add some of it. But I'm not at all sure about the state of the other
> PMU drivers we have.
>
> Just for giggles, P4 has VMX support... /me runs like crazy

I am working on Solution 1.1 from the approach proposed in [*].
I will send V2 (for IBS virtualization series) based on it shortly.

* https://lore.kernel.org/all/20230908133114.GK19320@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m7389910e577966c93a0b50fbaf9442be80dc730b

- Manali