Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments
From: Roger Pau Monné
Date: Tue Sep 14 2021 - 08:41:21 EST
On Tue, Sep 14, 2021 at 01:58:29PM +0200, Jan Beulich wrote:
> On 14.09.2021 13:15, Roger Pau Monné wrote:
> > On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
> >> On 14.09.2021 10:32, Roger Pau Monné wrote:
> >>> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
> >>>> In order to try to debug hypervisor side breakage from XSA-378 I found
> >>>> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
> >>>> quite as expected. In the course of investigating these issues I actually
> >>>> spotted one piece of PV Dom0 breakage as well, a fix for which is also
> >>>> included here.
> >>>>
> >>>> There are two immediate remaining issues (also mentioned in affected
> >>>> patches):
> >>>>
> >>>> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
> >>>> reports devices as they're discovered, including ones the hypervisor
> >>>> may not have been able to discover itself (ones on segments other
> >>>> than 0 or hotplugged ones). The respective hypercall, however, is
> >>>> inaccessible to PVH Dom0. Depending on the answer to this, either
> >>>> the hypervisor will need changing (to permit the call) or patch 2
> >>>> here will need further refinement.
> >>>
> >>> I would rather prefer if we could limit the hypercall usage to only
> >>> report hotplugged segments to Xen. Then Xen would have to scan the
> >>> segment when reported and add any devices found.
> >>>
> >>> Such hypercall must be used before dom0 tries to access any device, as
> >>> otherwise the BARs won't be mapped in the second stage translation and
> >>> the traps for the MCFG area won't be setup either.
> >>
> >> This might work if hotplugging would only ever be of segments, and not
> >> of individual devices. Yet the latter is, I think, a common case (as
> >> far as hotplugging itself is "common").
> >
> > Right, I agree to use hypercalls to report either hotplugged segments
> > or devices. However I would like to avoid mandating usage of the
> > hypercall for non-hotplug stuff, as then OSes not having hotplug
> > support don't really need to care about making use of those
> > hypercalls.
> >
> >> Also don't forget about SR-IOV VFs - they would typically not be there
> >> when booting. They would materialize when the PF driver initializes
> >> the device. This is, I think, something that can be dealt with by
> >> intercepting writes to the SR-IOV capability.
> >
> > My plan was to indeed trap SR-IOV capability accesses, see:
> >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2F20180717094830.54806-1-roger.pau%40citrix.com%2F&data=04%7C01%7Croger.pau%40citrix.com%7C35d2502d0128484e229e08d97777087f%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637672175399546062%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sSeE%2F4wEo5%2Fplkj2yH%2B1kpHi5c15lxJxeUxx6Cbyr4s%3D&reserved=0
> >
> > I just don't have time ATM to continue this work.
> >
> >> But I wonder whether
> >> there might be other cases where devices become "visible" only while
> >> the Dom0 kernel is already running.
> >
> > I would consider those kind of hotplug devices, and hence would
> > require the use of the hypercall in order to notify Xen about them.
>
> So what does this mean for the one patch? Should drivers/xen/pci.c
> then be built for PVH (and then have logic added to filter boot
> time device discovery), or should I restrict this to be PV-only (and
> PVH would get some completely different logic added later)?
I think we can reuse the same hypercalls for PVH, and maybe the same
code in Linux. For PVH we just need to be careful to make the
hypercalls before attempting to access the BARs (or the PCI
configuration space for the device) since there won't be any traps
setup, and BARs won't be mapped on the p2m.
It might be easier for Linux to just report every device it finds to
Xen, like it's currently done for PV dom0, instead of filtering on
whether the device has been hotplugged.
> >>>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
> >>>> console) when in a non-default mode (i.e. not 80x25 text), as the
> >>>> necessary information (in particular about VESA-bases LFB modes) is
> >>>> not communicated. On the hypervisor side this looks like deliberate
> >>>> behavior, but it is unclear to me what the intentions were towards
> >>>> an alternative model. (X may be able to access the screen depending
> >>>> on whether it has a suitable driver besides the presently unusable
> >>>> /dev/fb<N> based one.)
> >>>
> >>> I had to admit most of my boxes are headless servers, albeit I have
> >>> one NUC I can use to test gfx stuff, so I don't really use gfx output
> >>> with Xen.
> >>>
> >>> As I understand such information is fetched from the BIOS and passed
> >>> into Xen, which should then hand it over to the dom0 kernel?
> >>
> >> That's how PV Dom0 learns of the information, yes. See
> >> fill_console_start_info(). (I'm in the process of eliminating the
> >> need for some of the "fetch from BIOS" in Xen right now, but that's
> >> not going to get us as far as being able to delete that code, no
> >> matter how much in particular Andrew would like that to happen.)
> >>
> >>> I guess the only way for Linux dom0 kernel to fetch that information
> >>> would be to emulate the BIOS or drop into realmode and issue the BIOS
> >>> calls?
> >>
> >> Native Linux gets this information passed from the boot loader, I think
> >> (except in the EFI case, as per below).
> >>
> >>> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
> >>> info using the PV EFI interface?
> >>
> >> There it's EFI boot services functions which can be invoked before
> >> leaving boot services (in the native case). Aiui the PVH entry point
> >> lives logically past any EFI boot services interaction, and hence
> >> using them is not an option (if there was EFI firmware present in Dom0
> >> in the first place, which I consider difficult all by itself - this
> >> can't be the physical system's firmware, but I also don't see where
> >> virtual firmware would be taken from).
> >>
> >> There is no PV EFI interface to obtain video information. With the
> >> needed information getting passed via start_info, PV has no need for
> >> such, and I would be hesitant to add a fundamentally redundant
> >> interface for PVH. The more that the information needed isn't EFI-
> >> specific at all.
> >
> > I think our only option is to expand the HVM start info information to
> > convey that data from Xen into dom0.
>
> PHV doesn't use the ordinary start_info, does it?
No, it's HVM start info as described in:
xen/include/public/arch-x86/hvm/start_info.h
We have already extended it once to add a memory map, we could extend
it another time to add the video information.
Roger.