Re: [PATCH RFC] kvm: enable irq injection from interrupt context

From: Gleb Natapov
Date: Thu Sep 16 2010 - 10:06:21 EST


On Thu, Sep 16, 2010 at 03:51:37PM +0200, Michael S. Tsirkin wrote:
> On Thu, Sep 16, 2010 at 03:18:23PM +0200, Gleb Natapov wrote:
> > On Thu, Sep 16, 2010 at 02:57:17PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Sep 16, 2010 at 02:33:01PM +0200, Gleb Natapov wrote:
> > > > On Thu, Sep 16, 2010 at 02:13:38PM +0200, Michael S. Tsirkin wrote:
> > > > > > > We haver two users: qemu does deasserts, vhost-net does asserts.
> > > > > > Well this is broken. You want KVM to track level for you and this is
> > > > > > wrong. KVM does this anyway because it can't relay on devise model
> > > > > > to behave correctly [0], but in your case it is designed to behave
> > > > > > incorrectly.
> > > > > >
> > > > > > Interrupt type is a device property. PCI devices just happen to be level
> > > > > > triggered according to PCI spec. What if you want to use vhost-net to
> > > > > > implement network device which has active-low interrupt line? [1]
> > > > >
> > > > > The polarity would have to be reversed in gsi (irq line can be shared,
> > > > > all devices must be active high or low consistently).
> > > > >
> > > > There are gsi dedicated to PCI. They can be shared only between PCI
> > > > devices.
> > > >
> > > > > > If you want to split parts that asserts irq and de-asserts it then we
> > > > > > should have irqfd that tracks line status and knows interrupt line
> > > > > > polarity.
> > > > >
> > > > > Yes, it can know about polarity even though I think it's cleaner to do this
> > > > > per gsi. But it can not track line status as line is shared with
> > > > > other devices.
> > > > It should track only device's line status.
> > >
> > > There is no such thing as device's line status on real hardware, either.
> > > Devices do not drive INT# high: they drive it low (all the time)
> > > or do not drive it at all.
> > Same thing, other naming. Device either drive it low (irq_set(1)) or
> > not (irq_set(0)).
>
> Well, if I drive it low any number of times it should hae no effect.
>
There is no meaning of "driving low the line multiple times" on real HW.
You either drive it low or not. We try to emulate this with individual
"drive low/high" events in software.

> > >
> > > Or consider express, the spec explicitly says:
> > > "Note: Duplicate Assert_INTx/Deassert_INTx Messages have no effect, but
> > > are not errors."
> > >
> > > > >
> > > > > > > Another application is out of process virtio (sandboxing!).
> > > > > > It will still assert and de-assert irq at the same code, so it will be
> > > > > > able to track irq line status.
> > > > > >
> > > > > > > Again, pci stuff needs to stay in qemu.
> > > > > > >
> > > > > >
> > > > > > Nothing to do with PCI whatsoever.
> > > > > >
> > > > > > [0] most qemu devices behave incorrectly and trigger level irq more then
> > > > > > needed.
> > > > >
> > > > > Which devices?
> > > > Most of them. They just call update_irq_status() or something and
> > > > re-assert interrupt regardless of what previous status was.
> > >
> > > At least for PCI devices, these calls do nothing if status does not change.
> > I am not sure what code are you locking at. e1000 device emulation
> > doesn't check previous line status before calling qemu_set_irq().
>
> Right. If you dig through useless levels of indirection, you will
> see that each PCI device has 4 pin levels, when one of these
> changes this makes it up level to the parent bus, and so on.
Yes. Qemu PCI level does it right. Ideally device would not even invoke
this logic if interrupt status haven't changed.

>
> > >
> > > > > pci core tracks line status and will never assert the same
> > > > > line multiple times.
> > > > That's good if pci core does this, but device shouldn't even try it.
> > >
> > > I disagree. We don't want to duplicate a ton of code all over
> > > the codebase.
> > >
> > So abstract it into a function. It shouldn't be part of PCI emulation.
>
> I don't know what you mean by this, send a patch and we can discuss?
I don't care enough to send patch. Just remember previous irq status
and do not call qemu_set_irq() if it doesn't change. Three lines of
code.

> Note that when I patches PCI interrupt handling for compliance
> I made it mimic hardware as closely as possible: devices
> can send any # of assert/deassert messages, bus discards duplicates.
>
Qemu PCI code is correct as far as I can see. Not all devices are connected
via PCI and there is not need to go through couple of layer of
indirection to figure out that nothing should be done.


> > > > >
> > > > > > [1] this is how correct PCI device should behave but we override
> > > > > > polarity in ACPI, but now incorrect behaviour is deeply designed
> > > > > > into vhost-net.
> > > > >
> > > > > Not really, vhost net signals an eventfd. What happens then is
> > > > > up to kvm.
> > > > >
> > > > That is what current broken design does and it works, but if you want to
> > > > save unneeded calls into kvm fix design.
> > >
> > > The interface seems clean enough: vhost handles virtio ring, qemu/kvm handle pci.
> > > Making vhost aware of pci breaks this, I would not call that fixing the
> > > design.
> > >
> > Once again. Nothing to do with PCI, everything to do with device
> > emulation. And I propose to abstract interrupt assertion part into
> > irqfd, not into vhost.
> >
> > --
> > Gleb.
>
> This could work. KVM would need to find all irqfd
> objects mapped to gsi and notify them on deassert.
> No idea how hard this is.
>
What for? Device emulation should do de-assert.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/