Re: [PATCH 00/24] Thunderbolt security levels and NVM firmware upgrade

From: mika.westerberg@xxxxxxxxxxxxxxx
Date: Fri Aug 11 2017 - 11:14:11 EST


On Thu, May 25, 2017 at 03:03:07PM +0300, mika.westerberg@xxxxxxxxxxxxxxx wrote:
> On Thu, May 25, 2017 at 11:04:08AM +0300, mika.westerberg@xxxxxxxxxxxxxxx wrote:
> > On Thu, May 25, 2017 at 10:20:10AM +0300, mika.westerberg@xxxxxxxxxxxxxxx wrote:
> > > On Wed, May 24, 2017 at 07:32:45PM +0000, Jamet, Michael wrote:
> > > > I talked to our BIOS expert today. Here is his advice to debugging further:
> > > >
> > > > It looks like something may have been wrong from system (BIOS, FW, others...) perspective.
> > > > On reboot need to enter EFI shell and check resources of
> > > > pci 0000:01:00.0: bridge.
> > > > At the EFI shell, this bridge MUST be either configured or absent.
> > > >
> > > > I would start this way, once we have this info, we may circle back to
> > > > him and look into next debugging step.
> > >
> > > Thanks, I'll try this today.
> >
> >
> > This is the contents dumped directly from EFI shell when a device is
> > connected. It seems that the vendor_id/device_id is 0xffff but the rest
> > of the config seems to be present (although not fully configured):
> >
> > PCI Segment 00 Bus 01 Device 00 Func 00 [EFI 0001000000]
> > 00000000: FF FF FF FF 00 00 10 00-00 00 04 06 00 00 01 00 *................*
> > 00000010: 00 00 00 00 00 00 00 00-00 00 00 00 01 01 00 00 *................*
> > 00000020: 00 00 00 00 01 00 01 00-00 00 00 00 00 00 00 00 *................*
> > 00000030: 00 00 00 00 80 00 00 00-00 00 00 00 FF 01 00 00 *................*
> > 00000040: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 *................*
> > 00000050: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 *................*
> > 00000060: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 *................*
> > 00000070: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 *................*
> > 00000080: 01 88 C3 FF 08 00 00 00-05 AC 80 00 00 00 00 00 *................*
> > 00000090: 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 *................*
> > 000000A0: 00 00 00 00 00 00 00 00-00 00 00 00 0D C0 00 00 *................*
> > 000000B0: 22 22 11 11 00 00 00 00-00 00 00 00 00 00 00 00 *""..............*
> > 000000C0: 10 00 52 00 20 80 E8 07-10 28 10 00 43 5C 45 00 *..R. ....(..C\E.*
> > 000000D0: 00 00 23 10 00 00 00 00-00 00 00 00 00 00 00 00 *..#.............*
> > 000000E0: 00 00 00 00 00 08 00 00-00 00 00 00 0E 00 00 00 *................*
> > 000000F0: 03 00 1E 00 00 00 00 00-00 00 00 00 00 00 00 00 *................*
> >
> > I wonder how Linux manages to find the device if vendor_id/device_id
> > reads 0xffff?
>
> OK, here's the explanation.
>
> When Linux initializes ACPI (this happens before PCI initial scan), it
> calls acpi_initialize_objects(). This in turn causes _INI methods of
> devices to be executed. Now, the _SB.PCI0._INI() ends up calling
> \_GPE.TINI() which executes Thunderbolt specific OSUP() method. Purpose
> of this method is to overwrite vendor_id/device_id to the correct values
> with the assumption that the OS has already done the initial PCI scan.
>
> In case of Linux this is not true and that is the reason the upstream
> port is found half-initialized leading to the failure.

We finally found out what the problem is. In short the above
_SB.PCI0._INI() (and OSUP()) gets called correctly but this is only
part of the story. When OSUP() is called it rewrites the
vendor/deviceid and then signals that certain GPE handler can continue
to trigger the SMI handler which should enumerate all the devices
before PCI scan happens.

In Linux we enable GPEs later, after PCI scan so we only see partially
configured PCI bridges (as the SMI handler has not run yet).

Rafael's patch series here:

https://lkml.org/lkml/2017/8/9/1017

addresses this so that we enable GPEs earlier among other things.