Re: [PATCH v3 0/2] efi/x86: Call set_os() protocol on dual GPU Macs

From: Ard Biesheuvel
Date: Thu Jul 25 2024 - 08:39:58 EST


On Wed, 24 Jul 2024 at 18:27, Aditya Garg <gargaditya08@xxxxxxxx> wrote:
>
>
>
> > On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> >
> > On Tue, Jul 23, 2024 at 04:25:19PM +0000, Aditya Garg wrote:
> >>> On Wed, Jul 17, 2024 at 04:35:15PM +0000, Aditya Garg wrote:
> >>> For the Macs having a single GPU, in case a person uses an eGPU,
> >>> they still need this apple-set-os quirk for hybrid graphics.
> >>
> >> Sending this message again as for some reason it got sent only to Lukas:
> >>
> >> Full model name: Mac mini (2018) (Macmini8,1)
> >>
> >> The drive link below has the logs:
> >>
> >> https://drive.google.com/file/d/1P3-GlksU6WppvzvWC0A-nAoTZh7oPPxk/view?usp=drive_link
> >
> > Some observations:
> >
> > * dmesg-with-egpu.txt: It seems the system was actually booted *without*
> > an eGPU, so the filename appears to be a misnomer.
> >
> > * The two files in the with_apple_set_os_efi directory only contain
> > incomplete dmesg output. Boot with log_buf_len=16M to solve this.
> > Fortunately the truncated log is sufficient to see what's going on.
> >
> > * If the apple_set_os protocol is not used, the attached eGPU is not
> > enumerated by the kernel on boot and a rescan is required.
> > So neither the iGPU nor the eGPU are working. The reason is BIOS
> > sets up incorrect bridge windows for the Thunderbolt host controller:
> > Its two downstream ports' 64-bit windows overlap. The 32-bit windows
> > do not overlap. If apple_set_os is used, the eGPU is using the
> > (non-overlapping) 32-bit window. If apple_set_os is not used,
> > the attached eGPU is using the (overlapping, hence broken) 64-bit window.
> >
> > So not only is apple_set_os needed to keep the iGPU enabled,
> > but also to ensure BIOS sets up bridge windows in a manner that is
> > only halfway broken and not totally broken.
> >
> > Below, 0000:06:01.0 and 0000:06:04.0 are the downstream ports on the
> > Thunderbolt host controller and 0000:09:00.0 is the upstream port of
> > the attached eGPU.
> >
> > iGPU enabled, no eGPU attached (dmesg.txt):
> > pci 0000:06:01.0: bridge window [mem 0x81900000-0x888fffff]
> > pci 0000:06:01.0: bridge window [mem 0xb1400000-0xb83fffff 64bit pref]
> > pci 0000:06:04.0: bridge window [mem 0x88900000-0x8f8fffff]
> > pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
> >
> > iGPU disabled, eGPU attached, apple_set_os not used (journalctl.txt):
> > pci 0000:06:01.0: bridge window [mem 0x81900000-0x888fffff]
> > pci 0000:06:01.0: bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
> > pci 0000:06:04.0: bridge window [mem 0x88900000-0x8f8fffff]
> > pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
> > pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]: can't claim; address conflict with PCI Bus 0000:09 [mem 0xb1400000-0xbf3fffff 64bit pref]
> >
> > iGPU enabled, eGPU attached, apple_set_os used (working-journalctl.txt):
> > pci 0000:06:01.0: bridge window [mem 0x81900000-0x888fffff]
> > pci 0000:06:01.0: bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
> > pci 0000:06:04.0: bridge window [mem 0x88900000-0x8f8fffff]
> > pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
> > pci 0000:09:00.0: bridge window [mem 0x81900000-0x81cfffff]
> >
> > * As to how we can solve this and keep using apple_set_os only when
> > necessary:
> >
> > I note that on x86, the efistub walks over all PCI devices in the system
> > (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
> > retrieves the Device ID and Vendor ID. We could additionally retrieve
> > the Class Code and count the number of GPUs in the system by checking
> > whether the Class Code matches PCI_BASE_CLASS_DISPLAY. If there's
> > at least 2 GPUs in the system, invoke apple_set_os.
>
> This also looks like a good idea, but I'm not well aware of the pci quirks in the Linux kernel. So, would consider it a bug report for the maintainers to fix.

That is not how it works.

This is not a regression in Linux, and even if it was, it is not the
maintainers' job to fix bugs.

If Linux is lacking functionality that you find important, please
propose a patch the implements it, and argue why it should be merged.