Re: [PATCH v3] KVM: x86: Use fast path for Xen timer delivery

From: David Woodhouse
Date: Tue Feb 06 2024 - 22:30:07 EST


On Tue, 2024-02-06 at 18:58 -0800, Sean Christopherson wrote:
> On Tue, Feb 06, 2024, David Woodhouse wrote:
> > On Tue, 2024-02-06 at 10:41 -0800, Sean Christopherson wrote:
> > >
> > > This has an obvious-in-hindsight recursive deadlock bug.  If KVM actually needs
> > > to inject a timer IRQ, and the fast path fails, i.e. the gpc is invalid,
> > > kvm_xen_set_evtchn() will attempt to acquire xen.xen_lock, which is already held
> >
> > Hm, right. In fact, kvm_xen_set_evtchn() shouldn't actually *need* the
> > xen_lock in an ideal world; it's only taking it in order to work around
> > the fact that the gfn_to_pfn_cache doesn't have its *own* self-
> > sufficient locking. I have patches for that...
> >
> > I think the *simplest* of the "patches for that" approaches is just to
> > use the gpc->refresh_lock to cover all activate, refresh and deactivate
> > calls. I was waiting for Paul's series to land before sending that one,
> > but I'll work on it today, and double-check my belief that we can then
> > just drop xen_lock from kvm_xen_set_evtchn().
>
> While I definitely want to get rid of arch.xen.xen_lock, I don't want to address
> the deadlock by relying on adding more locking to the gpc code.  I want a teeny
> tiny patch that is easy to review and backport.  Y'all are *proably* the only
> folks that care about Xen emulation, but even so, that's not a valid reason for
> taking a roundabout way to fixing a deadlock.

I strongly disagree. I get that you're reticent about fixing the gpc
locking, but what I'm proposing is absolutely *not* a 'roundabout way
to fixing a deadlock'. The kvm_xen_set_evtchn() function shouldn't
*need* that lock; it's only taking it because of the underlying problem
with the gpc itself, which needs its caller to do its locking for it.

The solution is not to do further gymnastics with the xen_lock.

> Can't we simply not take xen_lock in kvm_xen_vcpu_get_attr()  It holds vcpu->mutex
> so it's mutually exclusive with kvm_xen_vcpu_set_attr(), and I don't see any other
> flows other than vCPU destruction that deactivate (or change) the gpc.

Maybe. Although with the gpc locking being incomplete, I'm extremely
concerned about something *implicitly* relying on the xen_lock. We
still need to fix the gpc to have self-contained locking.

I'll put something together and do some testing.

Attachment: smime.p7s
Description: S/MIME cryptographic signature