Re: [REGRESSION] Errors at reboot after 722e5f2b1eec

From: Rafael J. Wysocki
Date: Fri Sep 14 2018 - 02:28:52 EST


On Friday, September 14, 2018 4:29:46 AM CEST Pingfan Liu wrote:
> On Thu, Sep 13, 2018 at 10:15 PM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> >
> > On Thursday, September 13, 2018 12:03:36 PM CEST James Wang wrote:
> > > This is a multi-part message in MIME format.
> > > --------------F5519E624D0AD1E3F7DDA019
> > > Content-Type: text/plain; charset=utf-8
> > > Content-Transfer-Encoding: 8bit
> > >
> > >
> > >
> > > On 09/11/2018 02:15 PM, Takashi Iwai wrote:
> > > > On Tue, 11 Sep 2018 14:11:30 +0200,
> > > > James Wang wrote:
> > > >> I did try it from kernel : head
> > > > OK, then the bug is present with 4.19-rc2, at least.
> > > > Please check my test kernel later (it's still being built).
> > > Hi folks, I attach two log about 4.19-rc3 and 4.19-rc3+Rafeal suggestion.
> >
> > OK, no difference AFAICS.
> >
> > This means that the commit turned up by bisection simply uncovered an existing
> > ordering issue, apparently between an IOMMU and its client (ie. it appears that
> > the client is shut down after the IOMMU).
> >
> > This isn't limited to shutdown and you'd see the same issue on system-wide
> > suspend/resume (in fact, the Pingfan Liu's patches make shutdown use the
> > same device list that is used for system-wide PM).
> >
> > One way to mitigate such issues is to add a device link between the two
> > devices in question to enforce the correct suspend/resume/shutdown ordering
> > between them.
> >
> I found the iommu was at " 0000:00:00.2", while ohci-pci is at
> 0000:00:13.1. Hence ohci-pci should be shutdown before iommu. Not
> familiar with AMD's iommu code, but I think there is no shutdown
> interface exported to drivers/base. It is shutdown by platform code in
> arch/x86. So I think there should be someone teared down the iotbl,
> e.g. by invalidating pages, before the shutdown of ohci-pci. I wonder
> whether adding a device link can fix this bug or not. (Forgive me if I
> made a mistake, since I am ignorant in this field)

Adding a device link should help, as it effectively causes dpm_list to
be reordered in accordance with the link direction, but it also takes
care of the other children and linked devices as appropriate.

The difficulty is that whoever wants to add a device link between two
devices needs to have pointers to the device objects in question upfront.

Thanks,
Rafael