Re: [PATCH] wifi: mwifiex: decrease timeout waiting for host sleep from 10s to 5s
From: Doug Anderson
Date: Wed Dec 04 2024 - 14:19:31 EST
Hi,
On Wed, Dec 4, 2024 at 5:45 AM Pin-yen Lin <treapking@xxxxxxxxxxxx> wrote:
>
> > > > Can you try testing (and gather timing numbers) when
> > > > suspending soon after initiating scans? It's hard to judge what the
> > > > lower limit of this timeout should really be without any numbers, just
> > > > like it's hard to judge whether your 10 second watchdog is reasonable.
> > >
> > > Pin-yen: is this something you could gather?
>
> I tried entering suspend right after wifi scans, and the time spent in
> mwifiex_enable_hs() is always around 100ms. It seems initiating
> suspend does not increase the execution time for mwifiex_enable_hs(),
> so I think the driver is capable of interrupting a scan.
> > >
> > >
> > > > Also, for the record, since we might have to field regression reports
> > > > for other systems: what hardware (chip variant, FW version) are you
> > > > seeing problems on?
> > >
> > > Pin-yen: I'm assuming you'll provide this.
>
> From the debugfs entry:
>
> driver_name = "mwifiex"
> driver_version = mwifiex 1.0 (15.68.19.p54)
> verext = w8897o-B0, RF87XX, FP68, 15.68.19.p54
>
> The compatible string of the DT is "marvell,sd8897".
> >
> > I'll leave it up to y'all (Doug and Pin-Yen) whether you want to provide
> > the above to provide a little more confidence, or if you want to
> > reconsider your use of CONFIG_DPM_WATCHDOG_TIMEOUT.
>
> It looks okay to me to decrease the timeout here given that scanning
> doesn't seem to affect the suspend time. What's your thought, Doug?
I think Brian is right and that we should change how we're using the
DPM watchdog, but IMO that doesn't mean that we couldn't also change
this timeout.
If nothing else, you'd want to post a v2 of your patch containing a
summary of the info you've gathered so it's recorded with the patch.
Maybe what makes the most sense, though, is to put our money where our
mouth is and land some sort of patch in the ChromeOS tree and then
report back to upstream in a month when we have data about it. If
things look good in the field then presumably that would give some
confidence for upstream to be willing to land the patch?
Probably about the best data we could gather would be to break the
existing timeout into two halves. You could wait half the time, then
print a warning, and then wait the other half the time. We could even
land that change _without_ changing the timeout to 5 seconds. Then if
we ever see the warning print but then the suspend succeeds we know
that there are cases where waiting longer would have helped. If we
made that a WARN_ON() our existing infrastructure would help us gather
that info...
...so I guess summarizing my rambling, a good plan would be:
1. Change the ChromeOS DPM watchdog timeout to 15 seconds to avoid the
problem for now.
2. Land a patch to do a WARN_ON() in mwifiex if we take 5 seconds.
Actually, you could (probably) send this upstream and land it
FROMLIST?
3. Wait ~1-2 months and see if we ever see the WARN_ON() trigger but
then things succeed after 10 seconds. If this never happens then send
a v2 patch changing the timeout to 5 seconds with details in the
commit message.
4. In the background, see if we can convince folks to make the DPM
Watchdog less panicky.
-Doug