Re: [REGRESSION] Errors at reboot after 722e5f2b1eec

From: Borislav Petkov
Date: Fri Sep 14 2018 - 03:14:39 EST


+ JÃrg.

On Fri, Sep 14, 2018 at 08:26:07AM +0200, Rafael J. Wysocki wrote:
> On Friday, September 14, 2018 4:29:46 AM CEST Pingfan Liu wrote:
> > On Thu, Sep 13, 2018 at 10:15 PM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thursday, September 13, 2018 12:03:36 PM CEST James Wang wrote:
> > > > This is a multi-part message in MIME format.
> > > > --------------F5519E624D0AD1E3F7DDA019
> > > > Content-Type: text/plain; charset=utf-8
> > > > Content-Transfer-Encoding: 8bit
> > > >
> > > >
> > > >
> > > > On 09/11/2018 02:15 PM, Takashi Iwai wrote:
> > > > > On Tue, 11 Sep 2018 14:11:30 +0200,
> > > > > James Wang wrote:
> > > > >> I did try it from kernel : head
> > > > > OK, then the bug is present with 4.19-rc2, at least.
> > > > > Please check my test kernel later (it's still being built).
> > > > Hi folks, I attach two log about 4.19-rc3 and 4.19-rc3+Rafeal suggestion.
> > >
> > > OK, no difference AFAICS.
> > >
> > > This means that the commit turned up by bisection simply uncovered an existing
> > > ordering issue, apparently between an IOMMU and its client (ie. it appears that
> > > the client is shut down after the IOMMU).
> > >
> > > This isn't limited to shutdown and you'd see the same issue on system-wide
> > > suspend/resume (in fact, the Pingfan Liu's patches make shutdown use the
> > > same device list that is used for system-wide PM).
> > >
> > > One way to mitigate such issues is to add a device link between the two
> > > devices in question to enforce the correct suspend/resume/shutdown ordering
> > > between them.
> > >
> > I found the iommu was at " 0000:00:00.2", while ohci-pci is at
> > 0000:00:13.1. Hence ohci-pci should be shutdown before iommu. Not
> > familiar with AMD's iommu code, but I think there is no shutdown
> > interface exported to drivers/base. It is shutdown by platform code in
> > arch/x86. So I think there should be someone teared down the iotbl,
> > e.g. by invalidating pages, before the shutdown of ohci-pci. I wonder
> > whether adding a device link can fix this bug or not. (Forgive me if I
> > made a mistake, since I am ignorant in this field)
>
> Adding a device link should help, as it effectively causes dpm_list to
> be reordered in accordance with the link direction, but it also takes
> care of the other children and linked devices as appropriate.
>
> The difficulty is that whoever wants to add a device link between two
> devices needs to have pointers to the device objects in question upfront.
>
> Thanks,
> Rafael
>

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix ImendÃrffer, Jane Smithard, Graham Norton, HRB 21284 (AG NÃrnberg)
--