Re: [Xen-devel] [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest

From: Luis R. Rodriguez
Date: Thu Feb 04 2016 - 18:10:43 EST


On Thu, Feb 04, 2016 at 12:51:38AM +0000, Andrew Cooper wrote:
> On 03/02/2016 23:59, Luis R. Rodriguez wrote:
> > On Wed, Feb 03, 2016 at 08:52:50PM +0000, Andrew Cooper wrote:
> >> On 03/02/16 18:55, Luis R. Rodriguez wrote:
> >>> We add new hypervisor type to close the semantic gap for hypervisor types, and
> >>> much like subarch enable also a subarch_data to let you pass and use your
> >>> hvmlite_start_info. This would not only help with the semantics but also help
> >>> avoid yet-another-entry point and force us to provide a well define structure
> >>> for considering code that should not run by pegging it as required or supported
> >>> for different early x86 code stubs.
> >> Was I unclear last time? Xen *will not* be introducing Linux-specifics
> >> into the HVMLite starting ABI.
> > This does not have to be "Linux specifics" but rather a light way to enable
> > a hypervisor to clue in *any* OS of its hypervisor type, guest type, and
> > custom hypervisor data that can be used to populate needed OS specifics
> > about the guest. Perhaps Xen's own loader mechanism could be extended just
> > slightly to become *that* standard, its just right now it doesn't seem to
> > enable for generalizing this in a very useful way for OSes. Its all
> > custom stubs.
>
> There are already standard x86 ways of doing this, via the hypervisor
> cpuid bits. Xen presents itself normally in this regard, as do all the
> other hypervisors.

I don't think this is availably early in asm boot? Its why I think the
zero page is convenient. The boot loader should in theory know these
things, as well as if its in 32-bit, 64-bit, etc.

> It is completely backwards to expect a hypervisor (or toolstack in our
> case) to deliberately prod what it suspects might be a Linux binary in a
> way which it things a Linux binary might like to be prodded.

Perhaps prodding tons of info seems ludicrous, however prodding at least a
loader type and custom data pointer to interpret that so that then your stub
can interpret seems sensible for many reasons and I don't think prodding two
things is much to ask for, given the possible gains on clean architecture.
Its why I am suggesting perhaps this should just be standardized.

We need flexibility on both sides.

> >> Your perceived problem with multiple entry points is not a problem with
> >> multiple entry points; It is a problem with multiple different paths
> >> performing the same initialisation.
> > Its actually more of an issue with the lack of strong general semantics
> > available for different hypervisors and guest types and requirements for x86's
> > init path. What you end up with as collateral is multiple entry points, and
> > these can be sloppy and as you note can perform the same initialisation.
> > Another issue is the inability to proactively ensure new x86 init code
> > addresses different x86 requirements (cr4 shadow regression and Kasan still
> > being broken on Xen are two examples) and it just so happens that the lack of
> > semantics for the different guest types required to be evaluated is one issue
> > for x86.
> >
> > We can do better.
>
> Even with a perfect startup() routine which caters for all runtime
> usecases, you cannot avoid having multiple entry stubs to cater for the
> different ways the binary might be started.
>
> Unless you are volunteering to write a single stub which can first
> evaluate whether it is in 16/32/64bit mode, then create a safe stack to
> use, then evaluate how it was started (multiboot, legacy BIOS, EFI,
> etc.) and turn all this information into a zeropage.
>
> I don't know that would be possible, but the point is moot as it
> definitely wouldn't be maintainable if it were possible.

I think some folks have hope at least some of it might be. I can't do this,
otherwise I would have done it already. Given my review of the commit logs on
different entry points, and code I do think its sensible to desire this to help
with semantics on startup and this should in turn help duplication, bugs, but I
obviously do not doubt its difficulty.

Its at least sensible in my mind to strive towards the best possible semantics
and code sharing from x86-64 bit onwards and if I can help with that I'll do
what I can.

> >> The Linux entry for PV guests is indeed completely horrible. I am not
> >> trying to defend it in the slightest.
> >>
> >> However, the HVMLite entry which is a very short stub that sets up a
> >> zeropage and hands off to the native start routine is fine.
> > Its alright, and a huge stepping stone towards good architecture. We
> > however can do better.
>
> Then we are generally in agreement. However, at the risk of sounding
> like a grump old sod, take this win first and then work on the next
> stepping stone.

I think this is fair.

> Review comments identifying "I am working on implementing a new $X which
> will make this area better/easier/more shiny in the future" are fine.
> Review comments complaining that "you haven't used this shiny new $X
> which doesn't exist yet" are a waste of time; time which you would be
> better spent implementing said $X.
>
> No one is disagreeing that improvements can be made, but don't try to do
> them all at once, or nothing will get done.

Its a fair point, the only contending issue here is the use of
paravirt_enabled() and how I'm changing this to paravirt_legacy(),
I think that's it. Other than we can coordinate on both fronts
to later help clean things up further.

> >> There is still just routine performing native x86 startup.
> >>
> >> If you still desperately want to avoid multiple entry points, then just
> >> insist on using grub for the VM. I expect that that is how most people
> >> will end up using HVMLite VMs anyway.
> > Are you saying Grub can do some of this heavy lifting that I am trying
> > to avoid? If so that'd be great news.
>
> There are two different ways of running your VM, depending on your usecase.
>
> The more traditional approach of a full OS will want to load a kernel
> out of the guests filesystem, according to the guests bootloader
> configuration. At this point it doesn't matter for Linux as it will be
> booted by some already-existing bootloader which already knows how to do
> the job.

The grub instance would be on the guest filesystem right? So grub would know
it got kicked by Xen, and grub could then prod what we think is right? In
other words you pass on the custom xen struct to grub and then grub does
the zero page filling.

> The more container/unikernel oriented approach is to boot an image
> directly from dom0, skipping the middle layers of firmware and
> filesystems. For this to work, Linux needs to be able to be started via
> the hypervisor ABI of choice, which in this case is something which
> looks very much like multiboot.

I see thanks. And the hypervisor ABI in no way would ever consider an option to
let OSes have 2 bits of info set, the hypervisor guest type and a pointer to
custom data structure for the guest type, in order to help with a cleaner
startup on OSes?

Luis