Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths

From: Rafael J. Wysocki
Date: Wed Jul 18 2018 - 03:38:48 EST


On Tue, Jul 17, 2018 at 8:20 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> On Tue, Jul 17, 2018 at 12:53:11PM -0400, Lyude Paul wrote:
>> On Tue, 2018-07-17 at 09:16 +0200, Lukas Wunner wrote:
>> > On Mon, Jul 16, 2018 at 07:59:25PM -0400, Lyude Paul wrote:
>> > > In order to fix all of the spots that need to have runtime PM get/puts()
>> > > added, we need to ensure that it's possible for us to call
>> > > pm_runtime_get/put() in any context, regardless of how deep, since
>> > > almost all of the spots that are currently missing refs can potentially
>> > > get called in the runtime suspend/resume path. Otherwise, we'll try to
>> > > resume the GPU as we're trying to resume the GPU (and vice-versa) and
>> > > cause the kernel to deadlock.
>> > >
>> > > With this, it should be safe to call the pm runtime functions in any
>> > > context in nouveau with one condition: any point in the driver that
>> > > calls pm_runtime_get*() cannot hold any locks owned by nouveau that
>> > > would be acquired anywhere inside nouveau_pmops_runtime_resume().
>> > > This includes modesetting locks, i2c bus locks, etc.
>> >
>> > [snip]
>> > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> > > @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev)
>> > > return -EBUSY;
>> > > }
>> > >
>> > > + dev->power.disable_depth++;
>> > > +
>> >
>> > Anyway, if I understand the commit message correctly, you're hitting a
>> > pm_runtime_get_sync() in a code path that itself is called during a
>> > pm_runtime_get_sync(). Could you include stack traces in the commit
>> > message? My gut feeling is that this patch masks a deeper issue,
>> > e.g. if the runtime_resume code path does in fact directly poll outputs,
>> > that would seem wrong. Runtime resume should merely make the card
>> > accessible, i.e. reinstate power if necessary, put into PCI_D0,
>> > restore registers, etc. Output polling should be scheduled
>> > asynchronously.
>>
>> So: the reason that patch was added was mainly for the patches later in the
>> series that add guards around the i2c bus and aux bus, since both of those
>> require that the device be awake for it to work. Currently, the spot where it
>> would recurse is:
>
> Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire()
> wants it in resumed state, so is waiting forever for the device to
> runtime suspend in order to resume it again immediately afterwards.
>
> The deadlock in the stack trace you've posted could be resolved using
> the technique I used in d61a5c106351 by adding the following to
> include/linux/pm_runtime.h:
>
> static inline bool pm_runtime_status_suspending(struct device *dev)
> {
> return dev->power.runtime_status == RPM_SUSPENDING;
> }
>
> static inline bool is_pm_work(struct device *dev)
> {
> struct work_struct *work = current_work();
>
> return work && work->func == dev->power.work;
> }
>
> Then adding this to nvkm_i2c_aux_acquire():
>
> struct device *dev = pad->i2c->subdev.device->dev;
>
> if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) {
> ret = pm_runtime_get_sync(dev);
> if (ret < 0 && ret != -EACCES)
> return ret;
> }
>
> But here's the catch: This only works for an *async* runtime suspend.
> It doesn't work for pm_runtime_put_sync(), pm_runtime_suspend() etc,
> because then the runtime suspend is executed in the context of the caller,
> not in the context of dev->power.work.
>
> So it's not a full solution, but hopefully something that gets you
> going. I'm not really familiar with the code paths leading to
> nvkm_i2c_aux_acquire() to come up with a full solution off the top
> of my head I'm afraid.
>
> Note, it's not sufficient to just check pm_runtime_status_suspending(dev)
> because if the runtime_suspend is carried out concurrently by something
> else, this will return true but it's not guaranteed that the device is
> actually kept awake until the i2c communication has been fully performed.

For the record, I don't quite like this approach as it seems to be
working around a broken dependency graph.

If you need to resume device A from within the runtime resume callback
of device B, then clearly B depends on A and there should be a link
between them.

That said, I do realize that it may be the path of least resistance,
but then I wonder if we can do better than this.

Thanks,
Rafael