Re: [PATCH RFC 10/39] KVM: x86/xen: support upcall vector

From: David Woodhouse
Date: Fri Jan 01 2021 - 09:39:20 EST


On Wed, 2020-12-02 at 18:34 +0000, Joao Martins wrote:
> > But if the kernel is going to short-circuit the IPIs and VIRQs, then
> > it's already going to have to handle the evtchn_pending/evtchn_mask
> > bitmaps, and actually injecting interrupts.
> >
>
> Right. I was trying to point that out in the discussion we had
> in next patch. But true be told, more about touting the idea of kernel
> knowing if a given event channel is registered for userspace handling,
> rather than fully handling the event channel.
>
> I suppose we are able to provide both options to the VMM anyway
> i.e. 1) letting them handle it enterily in userspace by intercepting
> EVTCHNOP_send, or through the irq route if we want kernel to offload it.
>
> > Given that it has to have that functionality anyway, it seems saner to
> > let the kernel have full control over it and to just expose
> > 'evtchn_send' to userspace.
> >
> > The alternative is to have userspace trying to play along with the
> > atomic handling of those bitmasks too
>
> That part is not particularly hard -- having it done already.

Right, for 2-level it works out fairly well. I like your model of
installing { vcpu_id, port } into the KVM IRQ routing table and that's
enough for the kernel to raise the event channels that it *needs* to
know about, while userspace can do the others for itself. It's just
atomic test-and-set bitop stuff with no real need for coordination.

For FIFO event channels it gets more fun, because the updates aren't
truly atomic — they require a spinlock around the three operations that
the host has to perform when linking an event into a queue:

• Set the new port's LINKED bit
• Set the previous head's LINK to point to the new port
• Store the new port# as the head.

One option might be to declare that for FIFO, all updates for a given
queue *must* be handled either by the kernel, or by userspace, and
there's sharing of control.

Or maybe there's a way to keep the kernel side simple by avoiding the
tail handling completely. Surely we only really care about kernel
handling of the *fast* path, where a single event channel is triggered
and handled immediately? In the high-latency case where we're gathering
multiple events in a queue before the guest ever gets to process them,
we might as well let that be handled by userspace, right?

So in that case, perhaps the kernel part could forget all the horrid
nonsense with tracking the tail of the queue. It would handle the event
in-kernel *only* in the case where the event is the *first* in the
queue, and the head was previously zero?

But even that isn't a simple atomic operation though; we still have to
mark the event LINKED, then update the head pointer to reference it.
And what if we set the 'LINKED' bit but then find that userspace has
already linked another port so ->head is no longer zero?

Can we just clear the LINKED bit and then punt the whole event for
userspace to (re)handle? Or raise a special event to userspace so it
knows it needs to go ahead and link the port even though its LINKED bit
has already been set?

None of the available options really fill me with joy; I'm somewhat
inclined just to declare that the kernel won't support acceleration of
FIFO event channels at all.

None of which matters a *huge* amount right now if I was only going to
leave that as a future optimisation anyway.

What it does actually mean in the short term is that as I update your
KVM_IRQ_ROUTING_XEN_EVTCHN support, I probably *won't* bother to add a
'priority' field to struct kvm_irq_routing_xen_evtchn to make it
extensible to FIFO event channels. We can always add that later.

Does that seem reasonable?

Attachment: smime.p7s
Description: S/MIME cryptographic signature