Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

From: Jeremy Fitzhardinge
Date: Wed Mar 07 2007 - 17:05:58 EST


Thomas Gleixner wrote:
> This is tinkering of the best. My understanding of the paravirt
> discussion at Kernel Summit was, that paravirt ops are exactly there to
> prevent the above random hackery in the kernel and to allow _ALL_
> hypervisors to interact via a sane interface inside of the kernel.
>

No, I don't think that was ever the intent. The idea was to create a
new interface for things which don't currently have an interface in the
kernel, such as how to run the CPU in ring 1 and manage pagetable
updates. But an important and explicit intent of the project was to use
existing kernel interfaces where possible, rather than try to make
pv_ops an monster all-encompassing interface.

Using the new time infrastructure was an explicit example of that. We
anticipated that different hypervisors would have different ways of
doing time, but all would be easily accommodated by the
clocksource/events infrastructure, and so each would have its own
implementation for these interfaces. From the kernel's perspective,
they're just another time device, and we manage to avoid making any core
kernel changes, or bloating the pv_ops interface. It seems like a
natural use of the clock subsystem's design.

> You are just perverting the whole idea of a standartized
> paravirtualization interface.
>
> This things can be done for clocksources, clockevents, interrupts (the
> generic irq code allows this) and probaly for a whole bunch of other
> stuff.
>

Yes, exactly. The entirety of the Xen support consists of not only an
implementation of the paravirt_ops interface, but also the Xen
clocksource and clockevents and the Xen irqchip. My hope and intent is
that we can shrink the paravirt_ops interface in favour of using
existing generally useful kernel interfaces.

> The current paravirt interface is completely insane and will explode
> into an unmaintainable nightmare within no time, if we keep accepting
> that crap further.
>

No, that's exactly what we've been trying to avoid.

If we start patching in new paravirt_ops to deal with time, interrupts,
or whatever piece of functionality which already has a perfectly good
kernel interface, then we're just increasing the size of the pv_ops
interface, its entanglement with the rest of the system and the amount
of potential legacy stuff which gets dragged around as the interface
evolves.

As hardware gets better at supporting virtualization directly, we're
going to see more hybrid para- and fully- virtualized hypervisor
interfaces. The result will be that more and more of paravirt_ops will
be implemented by the "native" versions of the functions; maybe at some
point the whole thing will evaporate away.

It's not a huge reach to expect the hardware vendors to get a clue about
time hardware (scratch that, of course it is, but we can always hope)
and come up with something that is directly usable from either an OS
running natively or from within a virtual machine. In that case, I'm
sure you'd agree it would warrant a real clocksource/event
implementation. In the scheme I'm proposing, that's no big deal; you
just register the hardware driver, and that's that. But what you're
proposing leaves this vestigial interface sitting in pv_ops, doing
nothing other than being redundant.


My principle goal here is to get the Xen code into the kernel, and I'm
being pragmatic about it. If you think having a xen_clocksource is an
absolute blocker to merging this stuff, then I'll add the interface to
pv_ops, and we'll work out how to wire all the hypervisors up underneath
that interface. But I think it's precisely the wrong way to go from an
overall kernel perspective.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/