Re: [PATCH] vfio/pci: Propagate ACPI notifications to the user-space

From: Alex Williamson
Date: Wed Mar 08 2023 - 15:07:20 EST


On Wed, 8 Mar 2023 10:45:51 -0800
Dominik Behr <dbehr@xxxxxxxxxxxx> wrote:

> On Wed, Mar 8, 2023 at 9:49 AM Alex Williamson
> <alex.williamson@xxxxxxxxxx> wrote:
>
> > Adding libvirt folks. This intentionally designs the interface in a
> > way that requires a privileged intermediary to monitor netlink on the
> > host, associate messages to VMs based on an attached device, and
> > re-inject the event to the VMM. Why wouldn't we use a channel
> > associated with the device for such events, such that the VMM has
> > direct access? The netlink path seems like it has more moving pieces,
> > possibly scalability issues, and maybe security issues?
>
> It is the same interface as other ACPI events like AC adapter LID etc
> are forwarded to user-space.
> ACPI events are not particularly high frequency like interrupts.

I'm not sure that's relevant, these interfaces don't proclaim to
provide isolation among host processes which manage behavior relative
to accessories. These are effectively system level services. It's only
a very, very specialized use case that places a VMM as peers among these
processes. Generally we don't want to grant a VMM any privileges beyond
what it absolutely needs, so letting a VMM managing an assigned NIC
really ought not to be able to snoop host events related to anything
other than the NIC.

> > > > What sort of ACPI events are we expecting to see here and what does user space do with them?
> The use we are looking at right now are D-notifier events about the
> GPU power available to mobile discrete GPUs.
> The firmware notifies the GPU driver and resource daemon to
> dynamically adjust the amount of power that can be used by the GPU.
>
> > The proposed interface really has no introspection, how does the VMM
> > know which devices need ACPI tables added "upfront"? How do these
> > events factor into hotplug device support, where we may not be able to
> > dynamically inject ACPI code into the VM?
>
> The VMM can examine PCI IDs and the associated firmware node of the
> PCI device to figure out what events to expect and what ACPI table to
> generate to support it but that should not be necessary.

I'm not entirely sure where your VMM is drawing the line between the VM
and management tools, but I think this is another case where the
hypervisor itself should not have privileges to examine the host
firmware tables to build its own. Something like libvirt would be
responsible for that.

> A generic GPE based ACPI event forwarder as Grzegorz proposed can be
> injected at VM init time and handle any notification that comes later,
> even from hotplug devices.

It appears that forwarder is sending the notify to a specific ACPI
device node, so it's unclear to me how that becomes boilerplate AML
added to all VMs. We'll need to notify different devices based on
different events, right?

> > The acpi_bus_generate_netlink_event() below really only seems to form a
> > u8 event type from the u32 event. Is this something that could be
> > provided directly from the vfio device uAPI with an ioeventfd, thus
> > providing introspection that a device supports ACPI event notifications
> > and the ability for the VMM to exclusively monitor those events, and
> > only those events for the device, without additional privileges?
>
> From what I can see these events are 8 bit as they come from ACPI.
> They also do not carry any payload and it is up to the receiving
> driver to query any additional context/state from the device.
> This will work the same in the VM where driver can query the same
> information from the passed through PCI device.
> There are multiple other netflink based ACPI events forwarders which
> do exactly the same thing for other devices like AC adapter, lid/power
> button, ACPI thermal notifications, etc.
> They all use the same mechanism and can be received by user-space
> programs whether VMMs or others.

But again, those other receivers are potentially system services, not
an isolated VM instance operating in a limited privilege environment.
IMO, it's very different if the host display server has access to lid
or power events than it is to allow some arbitrary VM that happens to
have an unrelated assigned device that same privilege.

On my laptop, I see multiple _GPE scopes, each apparently very unique
to the devices:

Scope (_GPE)
{
Method (_L0C, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
{
Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change
}

Method (_L0D, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
{
Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change
}

Method (_L0F, 0, Serialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
{
Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change
}
}

Scope (_GPE)
{
Method (_L19, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
{
Notify (\_SB.PCI0.GP17, 0x02) // Device Wake
Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake
Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake
Notify (\_SB.PWRB, 0x02) // Device Wake
}

Method (_L08, 0, NotSerialized) // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
{
Notify (\_SB.PCI0.GP18, 0x02) // Device Wake
Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake
Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake
Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake
}
}

At least one more even significantly more extensive, calling methods
that interact with OpRegions. So how does a simple stub of a
GPE block replicate this sort of behavior in the host AML? Thanks,

Alex