Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments
From: Jan Beulich
Date: Tue Sep 14 2021 - 11:14:09 EST
On 14.09.2021 14:41, Roger Pau Monné wrote:
> On Tue, Sep 14, 2021 at 01:58:29PM +0200, Jan Beulich wrote:
>> On 14.09.2021 13:15, Roger Pau Monné wrote:
>>> On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
>>>> On 14.09.2021 10:32, Roger Pau Monné wrote:
>>>>> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
>>>>>> In order to try to debug hypervisor side breakage from XSA-378 I found
>>>>>> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
>>>>>> quite as expected. In the course of investigating these issues I actually
>>>>>> spotted one piece of PV Dom0 breakage as well, a fix for which is also
>>>>>> included here.
>>>>>>
>>>>>> There are two immediate remaining issues (also mentioned in affected
>>>>>> patches):
>>>>>>
>>>>>> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
>>>>>> reports devices as they're discovered, including ones the hypervisor
>>>>>> may not have been able to discover itself (ones on segments other
>>>>>> than 0 or hotplugged ones). The respective hypercall, however, is
>>>>>> inaccessible to PVH Dom0. Depending on the answer to this, either
>>>>>> the hypervisor will need changing (to permit the call) or patch 2
>>>>>> here will need further refinement.
>>>>>
>>>>> I would rather prefer if we could limit the hypercall usage to only
>>>>> report hotplugged segments to Xen. Then Xen would have to scan the
>>>>> segment when reported and add any devices found.
>>>>>
>>>>> Such hypercall must be used before dom0 tries to access any device, as
>>>>> otherwise the BARs won't be mapped in the second stage translation and
>>>>> the traps for the MCFG area won't be setup either.
>>>>
>>>> This might work if hotplugging would only ever be of segments, and not
>>>> of individual devices. Yet the latter is, I think, a common case (as
>>>> far as hotplugging itself is "common").
>>>
>>> Right, I agree to use hypercalls to report either hotplugged segments
>>> or devices. However I would like to avoid mandating usage of the
>>> hypercall for non-hotplug stuff, as then OSes not having hotplug
>>> support don't really need to care about making use of those
>>> hypercalls.
>>>
>>>> Also don't forget about SR-IOV VFs - they would typically not be there
>>>> when booting. They would materialize when the PF driver initializes
>>>> the device. This is, I think, something that can be dealt with by
>>>> intercepting writes to the SR-IOV capability.
>>>
>>> My plan was to indeed trap SR-IOV capability accesses, see:
>>>
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2F20180717094830.54806-1-roger.pau%40citrix.com%2F&data=04%7C01%7Croger.pau%40citrix.com%7C35d2502d0128484e229e08d97777087f%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637672175399546062%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sSeE%2F4wEo5%2Fplkj2yH%2B1kpHi5c15lxJxeUxx6Cbyr4s%3D&reserved=0
>>>
>>> I just don't have time ATM to continue this work.
>>>
>>>> But I wonder whether
>>>> there might be other cases where devices become "visible" only while
>>>> the Dom0 kernel is already running.
>>>
>>> I would consider those kind of hotplug devices, and hence would
>>> require the use of the hypercall in order to notify Xen about them.
>>
>> So what does this mean for the one patch? Should drivers/xen/pci.c
>> then be built for PVH (and then have logic added to filter boot
>> time device discovery), or should I restrict this to be PV-only (and
>> PVH would get some completely different logic added later)?
>
> I think we can reuse the same hypercalls for PVH, and maybe the same
> code in Linux. For PVH we just need to be careful to make the
> hypercalls before attempting to access the BARs (or the PCI
> configuration space for the device) since there won't be any traps
> setup, and BARs won't be mapped on the p2m.
>
> It might be easier for Linux to just report every device it finds to
> Xen, like it's currently done for PV dom0, instead of filtering on
> whether the device has been hotplugged.
Okay. I'll leave the Linux patch as is then and instead make a Xen
patch to actually let through the necessary function(s) in
hvm_physdev_op().
>>>>>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
>>>>>> console) when in a non-default mode (i.e. not 80x25 text), as the
>>>>>> necessary information (in particular about VESA-bases LFB modes) is
>>>>>> not communicated. On the hypervisor side this looks like deliberate
>>>>>> behavior, but it is unclear to me what the intentions were towards
>>>>>> an alternative model. (X may be able to access the screen depending
>>>>>> on whether it has a suitable driver besides the presently unusable
>>>>>> /dev/fb<N> based one.)
>>>>>
>>>>> I had to admit most of my boxes are headless servers, albeit I have
>>>>> one NUC I can use to test gfx stuff, so I don't really use gfx output
>>>>> with Xen.
>>>>>
>>>>> As I understand such information is fetched from the BIOS and passed
>>>>> into Xen, which should then hand it over to the dom0 kernel?
>>>>
>>>> That's how PV Dom0 learns of the information, yes. See
>>>> fill_console_start_info(). (I'm in the process of eliminating the
>>>> need for some of the "fetch from BIOS" in Xen right now, but that's
>>>> not going to get us as far as being able to delete that code, no
>>>> matter how much in particular Andrew would like that to happen.)
>>>>
>>>>> I guess the only way for Linux dom0 kernel to fetch that information
>>>>> would be to emulate the BIOS or drop into realmode and issue the BIOS
>>>>> calls?
>>>>
>>>> Native Linux gets this information passed from the boot loader, I think
>>>> (except in the EFI case, as per below).
>>>>
>>>>> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
>>>>> info using the PV EFI interface?
>>>>
>>>> There it's EFI boot services functions which can be invoked before
>>>> leaving boot services (in the native case). Aiui the PVH entry point
>>>> lives logically past any EFI boot services interaction, and hence
>>>> using them is not an option (if there was EFI firmware present in Dom0
>>>> in the first place, which I consider difficult all by itself - this
>>>> can't be the physical system's firmware, but I also don't see where
>>>> virtual firmware would be taken from).
>>>>
>>>> There is no PV EFI interface to obtain video information. With the
>>>> needed information getting passed via start_info, PV has no need for
>>>> such, and I would be hesitant to add a fundamentally redundant
>>>> interface for PVH. The more that the information needed isn't EFI-
>>>> specific at all.
>>>
>>> I think our only option is to expand the HVM start info information to
>>> convey that data from Xen into dom0.
>>
>> PHV doesn't use the ordinary start_info, does it?
>
> No, it's HVM start info as described in:
>
> xen/include/public/arch-x86/hvm/start_info.h
>
> We have already extended it once to add a memory map, we could extend
> it another time to add the video information.
Okay, I'll try to make a(nother) patch along these lines. Since there's
a DomU counterpart in PV's start_info - where does that information get
passed for PVH? (I'm mainly wondering whether there's another approach
to consider.)
Jan