Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

From: Jan Beulich
Date: Tue Sep 14 2021 - 05:03:37 EST


On 14.09.2021 10:32, Roger Pau Monné wrote:
> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
>> In order to try to debug hypervisor side breakage from XSA-378 I found
>> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
>> quite as expected. In the course of investigating these issues I actually
>> spotted one piece of PV Dom0 breakage as well, a fix for which is also
>> included here.
>>
>> There are two immediate remaining issues (also mentioned in affected
>> patches):
>>
>> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
>> reports devices as they're discovered, including ones the hypervisor
>> may not have been able to discover itself (ones on segments other
>> than 0 or hotplugged ones). The respective hypercall, however, is
>> inaccessible to PVH Dom0. Depending on the answer to this, either
>> the hypervisor will need changing (to permit the call) or patch 2
>> here will need further refinement.
>
> I would rather prefer if we could limit the hypercall usage to only
> report hotplugged segments to Xen. Then Xen would have to scan the
> segment when reported and add any devices found.
>
> Such hypercall must be used before dom0 tries to access any device, as
> otherwise the BARs won't be mapped in the second stage translation and
> the traps for the MCFG area won't be setup either.

This might work if hotplugging would only ever be of segments, and not
of individual devices. Yet the latter is, I think, a common case (as
far as hotplugging itself is "common").

Also don't forget about SR-IOV VFs - they would typically not be there
when booting. They would materialize when the PF driver initializes
the device. This is, I think, something that can be dealt with by
intercepting writes to the SR-IOV capability. But I wonder whether
there might be other cases where devices become "visible" only while
the Dom0 kernel is already running.

>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
>> console) when in a non-default mode (i.e. not 80x25 text), as the
>> necessary information (in particular about VESA-bases LFB modes) is
>> not communicated. On the hypervisor side this looks like deliberate
>> behavior, but it is unclear to me what the intentions were towards
>> an alternative model. (X may be able to access the screen depending
>> on whether it has a suitable driver besides the presently unusable
>> /dev/fb<N> based one.)
>
> I had to admit most of my boxes are headless servers, albeit I have
> one NUC I can use to test gfx stuff, so I don't really use gfx output
> with Xen.
>
> As I understand such information is fetched from the BIOS and passed
> into Xen, which should then hand it over to the dom0 kernel?

That's how PV Dom0 learns of the information, yes. See
fill_console_start_info(). (I'm in the process of eliminating the
need for some of the "fetch from BIOS" in Xen right now, but that's
not going to get us as far as being able to delete that code, no
matter how much in particular Andrew would like that to happen.)

> I guess the only way for Linux dom0 kernel to fetch that information
> would be to emulate the BIOS or drop into realmode and issue the BIOS
> calls?

Native Linux gets this information passed from the boot loader, I think
(except in the EFI case, as per below).

> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
> info using the PV EFI interface?

There it's EFI boot services functions which can be invoked before
leaving boot services (in the native case). Aiui the PVH entry point
lives logically past any EFI boot services interaction, and hence
using them is not an option (if there was EFI firmware present in Dom0
in the first place, which I consider difficult all by itself - this
can't be the physical system's firmware, but I also don't see where
virtual firmware would be taken from).

There is no PV EFI interface to obtain video information. With the
needed information getting passed via start_info, PV has no need for
such, and I would be hesitant to add a fundamentally redundant
interface for PVH. The more that the information needed isn't EFI-
specific at all.

Jan