Re: 5.17-rc regression: rmi4 clients cannot deal with asynchronous suspend? (was: X1 Carbon touchpad not resumed)

From: Loic Poulain
Date: Tue Feb 08 2022 - 06:25:17 EST


Hi folks,

On Tue, 8 Feb 2022 at 03:50, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
> On Mon, 7 Feb 2022, Dmitry Torokhov wrote:
> > On Mon, Feb 07, 2022 at 01:41:36PM -0800, Rajat Jain wrote:
> > > +linux-input@xxxxxxxxxxxxxxx
> > >
> > > On Mon, Feb 7, 2022 at 1:09 PM Rajat Jain <rajatja@xxxxxxxxxx> wrote:
> > > >
> > > > +Rafael (for any inputs on asynchronous suspend / resume)
> > > > +Dmitry Torokhov (since no other maintainer of rmi4 in MAINTAINERS file)
> > > > +loic.poulain@xxxxxxxxxx (who fixed RMI device hierarchy recently)
> > > > + Some Synaptics folks (from recent commits - Vincent Huang, Andrew
> > > > Duggan, Cheiny)
> > > >
> > > > On Mon, Feb 7, 2022 at 12:23 PM Wolfram Sang <wsa@xxxxxxxxxx> wrote:
> > > > >
> > > > > Hello Hugh,
> > > > >
> > > > > > Bisection led to 172d931910e1db800f4e71e8ed92281b6f8c6ee2
> > > > > > ("i2c: enable async suspend/resume on i2c client devices")
> > > > > > and reverting that fixes it for me.
> > > > >
> > > > > Thank you for the report plus bisection and sorry for the regression!
> > > >
> > > > +1, Thanks for the bisection, and apologies for the inconveniences.
> > > >
> > > > The problem here seems to be that for some reason, some devices (all
> > > > connected to rmi4 adapter) failed to resume, but only when
> > > > asynchronous suspend is enabled (by 172d931910e1):
> > > >
> > > > [ 79.221064] rmi4_smbus 6-002c: failed to get SMBus version number!

Looks like this is the initial issue. Does the rmi device lose power
while suspended? if so could it be that enabling async_suspend makes
the device resuming earlier, at a time it is not yet ready? What if
you simply start with a naive msleep(200) in rmi_smb_resume()?

The rmi4 bus does not rely on generic device suspend/resume
infrastructure for its subdevices, so async_suspend should only impact
the moment at which the smbus rmi4 root device is resumed, but not the
way it and its subdevices are resumed.

Would be interesting to get some pm_debug/pm_trace to compare the
good/bad cases.




> > > > [ 79.265074] rmi4_physical rmi4-00: rmi_driver_reset_handler: Failed
> > > > to read current IRQ mask.
> > > > [ 79.308330] rmi4_f01 rmi4-00.fn01: Failed to restore normal operation: -6.
> > > > [ 79.308335] rmi4_f01 rmi4-00.fn01: Resume failed with code -6.
> > > > [ 79.308339] rmi4_physical rmi4-00: Failed to suspend functions: -6
> > > > [ 79.308342] rmi4_smbus 6-002c: Failed to resume device: -6
> > > > [ 79.351967] rmi4_physical rmi4-00: Failed to read irqs, code=-6
> > > >
> > > > A resume failure that only shows up during asynchronous resume,
> > > > typically means that the device is dependent on some other device to
> > > > resume first, but this dependency is NOT established in a parent child
> > > > relationship (which is wrong and needs to be fixed, perhaps using
> > > > device_add_link()). Thus the kernel may be resuming these devices
> > > > without first resuming some other device that these devices need to
> > > > depend on.
> > > >
> > > > TBH, I'm not sure how to fix this. The only hint I see is that all of
> > > > these devices seem to be attached to rmi4 device so perhaps something
> > > > there? I see 6e4860410b828f recently fixed device hierarchy for rmi4,
> > > > and so seemingly should have fixed this very issue (as also seen in
> > > > commit message)?
> > > >
> > > > >
> > > > > I will wait a few days if people come up with a fix. If not, I will
> > > > > revert the offending commit.
> > > >
> > > > While I'll be sad because this means no i2c-client can now resume in
> > > > parallel and increases resume latency by a *LOT* (hundreds of ms on
> > > > all Linux systems), I understand that this needs to be done unless
> > > > someone comes up with a fix.
> >
> > There is intricate dance happening switching touchpad from legacy PS/2
> > into RMI mode, with touchpad being dependent not only on SMbus
> > controller, but also on i8042 keyboard controller and its PS/2 port (or
> > rather their emulation by the system firmware).
> >
> > I wonder if we could apply a little bit more targeted patch:
> >
> > diff --git a/drivers/input/rmi4/rmi_smbus.c b/drivers/input/rmi4/rmi_smbus.c
> > index 2407ea43de59..3901d06d38ca 100644
> > --- a/drivers/input/rmi4/rmi_smbus.c
> > +++ b/drivers/input/rmi4/rmi_smbus.c
> > @@ -335,6 +335,7 @@ static int rmi_smb_probe(struct i2c_client *client,
> > return error;
> > }
> >
> > + device_disable_async_suspend(&client->dev);
> > return 0;
> > }
> >
> >
> > ... and if that works then we cant try to establish proper dependencies
> > via device links later.
> >
> > Hugh, could you please try this out and see if it helps?
>
> Yes, that works nicely, thanks Dmitry.
>
> By the way, my memory's been jogged by "rmi4" and the discussion above:
> I had a similar-ish problem with it a year ago, discussed with PM guys,
>
> https://lore.kernel.org/linux-pm/alpine.LSU.2.11.2101102010200.25762@eggly.anvils/
>
> I'm not saying you have to read through that thread, but you may find
> some relevance in it - Saravana concluded rmi4 driver isn't capturing
> parent/child relationship correctly (at that time, anyway).