Re: [RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev
From: Roger Pau Monné
Date: Wed Jan 31 2024 - 04:05:37 EST
On Tue, Jan 30, 2024 at 02:44:03PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 30, 2024 at 10:07:36AM +0100, Roger Pau Monné wrote:
> > On Mon, Jan 29, 2024 at 04:01:13PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Jan 25, 2024 at 07:17:24AM +0000, Chen, Jiqian wrote:
> > > > On 2024/1/24 00:02, Bjorn Helgaas wrote:
> > > > > On Tue, Jan 23, 2024 at 10:13:52AM +0000, Chen, Jiqian wrote:
> > > > >> On 2024/1/23 07:37, Bjorn Helgaas wrote:
> > > > >>> On Fri, Jan 05, 2024 at 02:22:17PM +0800, Jiqian Chen wrote:
> > > > >>>> There is a need for some scenarios to use gsi sysfs.
> > > > >>>> For example, when xen passthrough a device to dumU, it will
> > > > >>>> use gsi to map pirq, but currently userspace can't get gsi
> > > > >>>> number.
> > > > >>>> So, add gsi sysfs for that and for other potential scenarios.
> > > > >> ...
> > > > >
> > > > >>> I don't know enough about Xen to know why it needs the GSI in
> > > > >>> userspace. Is this passthrough brand new functionality that can't be
> > > > >>> done today because we don't expose the GSI yet?
> > >
> > > I assume this must be new functionality, i.e., this kind of
> > > passthrough does not work today, right?
> > >
> > > > >> has ACPI support and is responsible for detecting and controlling
> > > > >> the hardware, also it performs privileged operations such as the
> > > > >> creation of normal (unprivileged) domains DomUs. When we give to a
> > > > >> DomU direct access to a device, we need also to route the physical
> > > > >> interrupts to the DomU. In order to do so Xen needs to setup and map
> > > > >> the interrupts appropriately.
> > > > >
> > > > > What kernel interfaces are used for this setup and mapping?
> > > >
> > > > For passthrough devices, the setup and mapping of routing physical
> > > > interrupts to DomU are done on Xen hypervisor side, hypervisor only
> > > > need userspace to provide the GSI info, see Xen code:
> > > > xc_physdev_map_pirq require GSI and then will call hypercall to pass
> > > > GSI into hypervisor and then hypervisor will do the mapping and
> > > > routing, kernel doesn't do the setup and mapping.
> > >
> > > So we have to expose the GSI to userspace not because userspace itself
> > > uses it, but so userspace can turn around and pass it back into the
> > > kernel?
> >
> > No, the point is to pass it back to Xen, which doesn't know the
> > mapping between GSIs and PCI devices because it can't execute the ACPI
> > AML resource methods that provide such information.
> >
> > The (Linux) kernel is just a proxy that forwards the hypercalls from
> > user-space tools into Xen.
>
> But I guess Xen knows how to interpret a GSI even though it doesn't
> have access to AML?
On x86 Xen does know how to map a GSI into an IO-APIC pin, in order
configure the RTE as requested.
> > > It seems like it would be better for userspace to pass an identifier
> > > of the PCI device itself back into the hypervisor. Then the interface
> > > could be generic and potentially work even on non-ACPI systems where
> > > the GSI concept doesn't apply.
> >
> > We would still need a way to pass the GSI to PCI device relation to
> > the hypervisor, and then cache such data in the hypervisor.
> >
> > I don't think we have any preference of where such information should
> > be exposed, but given GSIs are an ACPI concept not specific to Xen
> > they should be exposed by a non-Xen specific interface.
>
> AFAIK Linux doesn't expose GSIs directly to userspace yet. The GSI
> concept relies on ACPI MADT, _MAT, _PRT, etc. A GSI is associated
> with some device (PCI in this case) and some interrupt controller
> entry. I don't understand how a GSI value is useful without knowing
> something about that framework in which GSIs exist.
I wouldn't say it's strictly associated with PCI. A GSI is a way for
ACPI to have a single space that unifies all possible IO-APICs pins in
the system in a flat way. A GSI is useful in itself because there's
a single GSI space for the whole host.
> Obviously I know less than nothing about Xen, so I apologize for
> asking all these stupid questions, but it just doesn't all make sense
> to me yet.
That's all fine, maybe there's a better path or way to expose this ACPI
information. Maybe introduce a per-device acpi directory and expose
it there? Or rename the entry to acpi_gsi?
Thanks, Roger.