Re: [PATCH] wifi: mwifiex: decrease timeout waiting for host sleep from 10s to 5s
From: Pin-yen Lin
Date: Wed Dec 04 2024 - 09:03:47 EST
Hi Brian,
On Wed, Dec 4, 2024 at 10:04 AM Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
>
> Hi Doug,
>
> On Tue, Dec 03, 2024 at 05:38:55PM -0800, Doug Anderson wrote:
> > On Tue, Dec 3, 2024 at 3:05 PM Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
> > > The Kconfig default is 120 seconds, and it's hidden under
> > > CONFIG_EXPERT. What makes you think 10 is a good value? (It sounds
> > > pretty small for triggering panics.) The smallest I can see outside of
> > > ChromeOS is 12 seconds, although 60 seconds is much more common. I
> > > also happen to see other WiFi drivers have hit similar problems, but
> > > they settled on 20 seconds (assuming a 60s timeout on other distros):
> > > https://lore.kernel.org/linux-wireless/20230329162038.8637-1-kvalo@xxxxxxxxxx/
> > > https://git.kernel.org/linus/cf5fa3ca0552f1b7ba8490de40700bbfb6979b17
> > >
> > > Technically, this Kconfig lets you set a value as small as 1 second. I
> > > don't think we should work at reducing all timeouts to fit that
> > > target.
> > >
> > > I could maybe approve lowering this one, but I'd also recommend
> > > reconsidering your timeout value. And more questions below.
> >
> > That's fair. I guess having a 10 second timeout for full system
> > suspend didn't seem totally crazy to me. If a system is taking more
> > than 10 seconds to do a full system suspend then that seems like
> > something is pretty broken. I guess it's somewhat like the same
> > argument that the WiFi driver had for picking 10 seconds but applied
> > to the whole system level, and I guess that's where we get into
> > trouble. That made me think that even 5 seconds seems a bit long for
> > any given driver to suspend. ...but yeah, it's squishy.
>
> 10 seconds is likely that *something* is wrong (or at least suboptimal),
> but IMO, it's not quite at unreasonable levels. But yes, my point was
> mainly that it's squishy, and I personally wouldn't want to be the one
> running with the lowest CONFIG_DPM_WATCHDOG_TIMEOUT out there, given the
> known behavior of multiple drivers and the timeout-means-panic behavior.
>
> > Maybe the ChromeOS should change to 15 seconds for the DPM Watchdog
> > timer and that's a better solution and leave the WiFi driver how it
> > is?
>
> That seems reasonable.
>
> To be clear, I'm OK with this patch, if we can get a little more
> confidence in it (like the timing data and HW info). I *think* 5 vs 10
> isn't a big deal here, but I don't have much other than my guess at the
> moment.
>
> > Another thought: I wonder if it's possible to be dynamic and somehow
> > set the timeout as "max(10, dpm_watchdog_timeout / 2)". Not that I've
>
> You probably meant min()?
>
> > checked to see if the mwifiex can actually query the DPM watchdog
> > timeout... ;-)
>
> Yeah, I wondered similarly -- or in reverse, if we could somehow "pat"
> the watchdog or prime it with an expected timeout. But AFAICT, neither
> such feature exists today.
>
> > ...also, it sure seems like if we're going to choose a value so low
> > that we shouldn't panic. All of our other watchdogs that panic aren't
> > so short, so probably this one shouldn't be either. Maybe we could
> > submit a patch to make the DPM watchdog just abort the suspend if
> > that's not too hard (and if the power people accept it).
>
> Yeah, if you made the watchdog just interrupt suspend and dump some
> warnings, then the effect would be pretty similar to this patch.
>
> > > I wonder what the distribution of (successful) timing is today. I'd
> > > assume this typically take on the order of milliseconds, but do we
> > > commonly see outliers? What if a system is fully loaded while
> > > suspending?
> >
> > I would hope this doesn't affect things from the DPM watchdog, but I
> > haven't researched. Hopefully the DPM watchdog starts after userspace
> > is frozen so the system being fully loaded shouldn't matter?
>
> I was just throwing out ideas, but I didn't specifically mean user
> space. You provided a few more ideas. Anyway, I was just fishing for
> *some* attempt at real-world (and, as-bad-as-you-can-simulate world)
> numbers, since that's the point of a timeout like this.
>
> > > Can you try testing (and gather timing numbers) when
> > > suspending soon after initiating scans? It's hard to judge what the
> > > lower limit of this timeout should really be without any numbers, just
> > > like it's hard to judge whether your 10 second watchdog is reasonable.
> >
> > Pin-yen: is this something you could gather?
I tried entering suspend right after wifi scans, and the time spent in
mwifiex_enable_hs() is always around 100ms. It seems initiating
suspend does not increase the execution time for mwifiex_enable_hs(),
so I think the driver is capable of interrupting a scan.
> >
> >
> > > Also, for the record, since we might have to field regression reports
> > > for other systems: what hardware (chip variant, FW version) are you
> > > seeing problems on?
> >
> > Pin-yen: I'm assuming you'll provide this.
>From the debugfs entry:
driver_name = "mwifiex"
driver_version = mwifiex 1.0 (15.68.19.p54)
verext = w8897o-B0, RF87XX, FP68, 15.68.19.p54
The compatible string of the DT is "marvell,sd8897".
>
> I'll leave it up to y'all (Doug and Pin-Yen) whether you want to provide
> the above to provide a little more confidence, or if you want to
> reconsider your use of CONFIG_DPM_WATCHDOG_TIMEOUT.
It looks okay to me to decrease the timeout here given that scanning
doesn't seem to affect the suspend time. What's your thought, Doug?
>
> Brian
Regards,
Pin-yen