Re: hardwired VMI crap
From: Zachary Amsden
Date: Thu Mar 08 2007 - 15:46:42 EST
Thomas Gleixner wrote:
On Thu, 2007-03-08 at 02:06 -0800, Zachary Amsden wrote:
The correct solution here is to properly separate the APIC, SMP, and
timer code so the logic of it which we want to reuse is separated from
the hardware dependence. Clock events and clocksources take care of
most of the timer issues, but there is still ugliness from SMP timer
events depending on having part of the APIC infrastructure for wiring
the interrupt gates.
what are you talking about? A clockevents driver does not need to know
about lapic details, at all. In terms of interrupt gates for the
hypervisor to notify about clock events - use a virtual interrupt
controller via genirq.
See my last e-mail. It is not possible on i386, since local per-cpu
interrupts are only supported via the APIC.
It is not possible from your POV. It is possible, as we have already a
complete irq abstraction layer, which supports _ALL_ of the
requirements.
To make use of it in a maintainable way, it just needs the work of doing
a proper client for the genirq layer, which get's its interrupt injected
by the hypervisor.
genirq() does not care by which mechanism handle_percpu_irq() is called.
We provided the abstractions and you just tell us straight in the face,
that your hypervisor works that way and therefor we have to accept that
you do it that way.
It's not rocket science to implement an abstract interrupt controller,
which lets you inject per cpu or global interrupts into the generic
layer. It needs some preparatory work to distangle the boot code
assumptions from the implicit hardware, but this is a better spent time,
than another set of hackery, which you already advertised for smpboot.c
When we're about two weeks away from a product release and you are
threatening to unmerge or block our code because we didn't create an
abstract interrupt controller, we re-used the APIC and IO-APIC, this is
uber rocket science. We've been doing things this way, with public
patches for over a year, and you've even been CC'd on some of the
discussions. So it is a little late to tell us - "redesign your
hypervisor, or else.."
All we want you and the other hypervisor folks to do is to
- use existing abstractions in the way they are designed
- create new ones where applicable
Great.
- break the hardwired hardware assumptions, so a sane emulation model
can be used.
Why? This is your own invention, as you think it would make life
easier. It doesn't - you still have real hardware to deal with, and
your code will always be designed to operate on silicon with these
hardwired assumptions. Breaking away from that can actually make the
code more complex, both in the hypervisor and in Linux.
So far, all you have done is not complain about our code until it was
merged, the pursue every tactic possible to break it. It is not us that
are stonewalling.
You have been told before. Andi asked you more than once to move to
clockevents.
Which we have done. And now you refuse to give any feedback on
technical points, but maintain an objection to the way we have done it.
If you can not change your hypervisor model to use a sane abstraction of
interrupts, then please emulate lapic, io_apic and everything else
_OUTSIDE_ of the kernel.
We faithfully emulate lapic, io_apic, the pit, pic, and a normal
interrupt subsystem. We can't magically stop using these things because
we have to support traditional full virtualization. Which means any
version of Linux, virtual interrupt controller or not, is going to boot
up, find these things, and try to use them. So for a paravirt kernel,
either we have to disable each of these things in the Linux code or just
re-use them.
So we re-use them. We don't even change their semantics. Where we get
into trouble is the fact that only the lapic can deliver per-cpu timer
IRQs, and we need to provide a better time abstraction than TSC. So we
need a time device, but there is no way to implement it in the
traditional hardware model.
And I ask again for your feedback on which approach you think is correct:
1) Rewrite the interrupt subsystem of our hypervisor, making it
incompatible with full virtualization, so that we can support an
abstract interrupt controller with a "clean" interface
2) Reuse the same method that HPET, PIT and other time clients in i386
use - the global_clock_event pointer which allows you to wrest control
back from the APIC and reuse the lapic_events local clockevents.
3) Create a new low level interrupt handler for the per-cpu VMI timer
IRQs instead of re-using the APIC handler
4) Use the irq APIs to allocate IRQ-0 as a percpu IRQ, then change the
IO-APIC code so it can know not to convert this PIC IRQ into a IO-APIC
edge IRQ.
5) Disable the io-apic code entirely in paravirt mode. Rather than
change it, merge a parallel copy of it into the VMI code
so that we can use the 99% of the code we need, with the one bugfix for
#4 above
6) Disable the apic code entirely in paravirt mode. Rather than change
it, merge a parallel copy of into the VMI code so that we can use the
90% of the code we need, with changes to the LVT0 timer handling.
7) For SMP only, allocate a non-shared IO-APIC IRQ, then after the
IO-APIC is initialized, magically switch this to a percpu handler and
start delivering local timer interrupts via this IRQ.
8) Create a pie-in-the-sky single interrupt source, reserve an IDT
vector for it (or steal the lapic timer slot), and use the irq apis to
set it up to be handled as a per-cpu interrupt. This actually sounds
pretty good, to me. The only problem is we will need to switch the
timer IRQ from IRQ 0 to this vector when the APIC is initialized, but I
think we already have all the machinery we need to handle that.
9) ???
This is a serious question, I would appreciate a serious response
instead of snide comments about the crappiness of our interface and our
code. Which do help a little, because by process of elimination, we
can rule out the approaches you don't like. But it would be more
productive if we could carry on a traditional dialogue and I could just
ask a question and you could answer and vice versa.
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/