Re: [PATCH] vgaarb: Use ACPI HID name to find integrated GPU
From: Kai-Heng Feng
Date: Wed Sep 22 2021 - 23:20:46 EST
On Sat, Sep 18, 2021 at 12:55 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Fri, Sep 17, 2021 at 11:49:45AM +0800, Kai-Heng Feng wrote:
> > On Fri, Sep 17, 2021 at 12:38 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > >
> > > [+cc Huacai, linux-pci]
> > >
> > > On Wed, May 19, 2021 at 09:57:23PM +0800, Kai-Heng Feng wrote:
> > > > Commit 3d42f1ddc47a ("vgaarb: Keep adding VGA device in queue") assumes
> > > > the first device is an integrated GPU. However, on AMD platforms an
> > > > integrated GPU can have higher PCI device number than a discrete GPU.
> > > >
> > > > Integrated GPU on ACPI platform generally has _DOD and _DOS method, so
> > > > use that as predicate to find integrated GPU. If the new strategy
> > > > doesn't work, fallback to use the first device as boot VGA.
> > > >
> > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> > > > ---
> > > > drivers/gpu/vga/vgaarb.c | 31 ++++++++++++++++++++++++++-----
> > > > 1 file changed, 26 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
> > > > index 5180c5687ee5..949fde433ea2 100644
> > > > --- a/drivers/gpu/vga/vgaarb.c
> > > > +++ b/drivers/gpu/vga/vgaarb.c
> > > > @@ -50,6 +50,7 @@
> > > > #include <linux/screen_info.h>
> > > > #include <linux/vt.h>
> > > > #include <linux/console.h>
> > > > +#include <linux/acpi.h>
> > > >
> > > > #include <linux/uaccess.h>
> > > >
> > > > @@ -1450,9 +1451,23 @@ static struct miscdevice vga_arb_device = {
> > > > MISC_DYNAMIC_MINOR, "vga_arbiter", &vga_arb_device_fops
> > > > };
> > > >
> > > > +#if defined(CONFIG_ACPI)
> > > > +static bool vga_arb_integrated_gpu(struct device *dev)
> > > > +{
> > > > + struct acpi_device *adev = ACPI_COMPANION(dev);
> > > > +
> > > > + return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID);
> > > > +}
> > > > +#else
> > > > +static bool vga_arb_integrated_gpu(struct device *dev)
> > > > +{
> > > > + return false;
> > > > +}
> > > > +#endif
> > > > +
> > > > static void __init vga_arb_select_default_device(void)
> > > > {
> > > > - struct pci_dev *pdev;
> > > > + struct pci_dev *pdev, *found = NULL;
> > > > struct vga_device *vgadev;
> > > >
> > > > #if defined(CONFIG_X86) || defined(CONFIG_IA64)
> > > > @@ -1505,20 +1520,26 @@ static void __init vga_arb_select_default_device(void)
> > > > #endif
> > > >
> > > > if (!vga_default_device()) {
> > > > - list_for_each_entry(vgadev, &vga_list, list) {
> > > > + list_for_each_entry_reverse(vgadev, &vga_list, list) {
> > >
> > > Hi Kai-Heng, do you remember why you changed the order of this list
> > > traversal?
> >
> > The descending order is to keep the original behavior.
> >
> > Before this patch, it breaks out of the loop as early as possible, so
> > the lower numbered device is picked.
> > This patch makes it only break out of the loop when ACPI_VIDEO_HID
> > device is found.
> > So if there are more than one device that meet "cmd & (PCI_COMMAND_IO
> > | PCI_COMMAND_MEMORY)", higher numbered device will be selected.
> > So the traverse order reversal is to keep the original behavior.
>
> Can you give an example of what you mean? I don't quite follow how it
> keeps the original behavior.
>
> If we have this:
>
> 0 PCI_COMMAND_MEMORY set ACPI_VIDEO_HID
> 1 PCI_COMMAND_MEMORY set ACPI_VIDEO_HID
>
> Previously we didn't look for ACPI_VIDEO_HID, so we chose 0, now we
> choose 1, which seems wrong. In the absence of other information, I
> would prefer the lower-numbered device.
>
> Or this:
>
> 0 PCI_COMMAND_MEMORY set
> 1 PCI_COMMAND_MEMORY set ACPI_VIDEO_HID
>
> Previously we chose 0; now we choose 1, which does seem right, but
> we'd choose 1 regardless of the order.
>
> Or this:
>
> 0 PCI_COMMAND_MEMORY set ACPI_VIDEO_HID
> 1 PCI_COMMAND_MEMORY set
>
> Previously we chose 0, now we still choose 0, which seems right but
> again doesn't depend on the order.
>
> The first case, where both devices are ACPI_VIDEO_HID, is the only one
> where the order matters, and I suggest that we should be using the
> original order, not the reversed order.
Consider this:
0 PCI_COMMAND_MEMORY set
1 PCI_COMMAND_MEMORY set
Originally device 0 will be picked. If the traverse order is kept,
device 1 will be selected instead, because none of them pass
vga_arb_integrated_gpu().
Kai-Heng
>
> > > I guess the list_add_tail() in vga_arbiter_add_pci_device() means
> > > vga_list is generally ordered with small device numbers first and
> > > large ones last.
> > >
> > > So you pick the integrated GPU with the largest device number. Are
> > > there systems with more than one integrated GPU? If so, I would
> > > naively expect that in the absence of an indication otherwise, we'd
> > > want the one with the *smallest* device number.
> >
> > There's only one integrated GPU on the affected system.
> >
> > The approach is to keep the list traversal in one pass.
> > Is there any regression introduce by this patch?
> > If that's the case, we can separate the logic and find the
> > ACPI_VIDEO_HID in second pass.
>
> No regression, I'm just looking at Huacai's VGA patches, which affect
> this area.