Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
From: Lukas Wunner
Date: Wed Jul 18 2018 - 04:25:11 EST
On Wed, Jul 18, 2018 at 09:38:41AM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 17, 2018 at 8:20 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire()
> > wants it in resumed state, so is waiting forever for the device to
> > runtime suspend in order to resume it again immediately afterwards.
> >
> > The deadlock in the stack trace you've posted could be resolved using
> > the technique I used in d61a5c106351 by adding the following to
> > include/linux/pm_runtime.h:
> >
> > static inline bool pm_runtime_status_suspending(struct device *dev)
> > {
> > return dev->power.runtime_status == RPM_SUSPENDING;
> > }
> >
> > static inline bool is_pm_work(struct device *dev)
> > {
> > struct work_struct *work = current_work();
> >
> > return work && work->func == dev->power.work;
> > }
> >
> > Then adding this to nvkm_i2c_aux_acquire():
> >
> > struct device *dev = pad->i2c->subdev.device->dev;
> >
> > if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) {
> > ret = pm_runtime_get_sync(dev);
> > if (ret < 0 && ret != -EACCES)
> > return ret;
> > }
[snip]
>
> For the record, I don't quite like this approach as it seems to be
> working around a broken dependency graph.
>
> If you need to resume device A from within the runtime resume callback
> of device B, then clearly B depends on A and there should be a link
> between them.
>
> That said, I do realize that it may be the path of least resistance,
> but then I wonder if we can do better than this.
The GPU contains an i2c subdevice for each connector with DDC lines.
I believe those are modelled as children of the GPU's PCI device as
they're accessed via mmio of the PCI device.
The problem here is that when the GPU's PCI device runtime suspends,
its i2c child device needs to be runtime active to suspend the MST
topology. Catch-22.
I don't know whether or not it's necessary to suspend the MST topology.
I'm not an expert on DisplayPort MultiStream transport.
BTW Lyude, in patch 4 and 5 of this series, you're runtime resuming
pad->i2c->subdev.device->dev. Is this the PCI device or is it the i2c
device? I'm always confused by nouveau's structs. In nvkm_i2c_bus_ctor()
I can see that the device you're runtime resuming is the parent of the
i2c_adapter:
struct nvkm_device *device = pad->i2c->subdev.device;
[...]
bus->i2c.dev.parent = device->dev;
If the i2c_adapter is a child of the PCI device, it's sufficient
to runtime resume the i2c_adapter, i.e. bus->i2c.dev, and this will
implicitly runtime resume its parent.
Thanks,
Lukas