Re: [PATCH 4/4] sched/core: split iowait state into two states

From: Peter Zijlstra
Date: Thu Apr 25 2024 - 06:16:51 EST


On Wed, Apr 24, 2024 at 11:08:42AM +0100, Christian Loehle wrote:
> On 24/04/2024 11:01, Peter Zijlstra wrote:
> > On Tue, Apr 16, 2024 at 06:11:21AM -0600, Jens Axboe wrote:
> >> iowait is a bogus metric, but it's helpful in the sense that it allows
> >> short waits to not enter sleep states that have a higher exit latency
> >> than would've otherwise have been picked for iowait'ing tasks. However,
> >> it's harmless in that lots of applications and monitoring assumes that
> >> iowait is busy time, or otherwise use it as a health metric.
> >> Particularly for async IO it's entirely nonsensical.
> >
> > Let me get this straight, all of this is about working around
> > cpuidle menu governor insaity?
> >
> > Rafael, how far along are we with fully deprecating that thing? Yes it
> > still exists, but should people really be using it still?
> >
>
> Well there is also the iowait boost handling in schedutil and intel_pstate
> which, at least in synthetic benchmarks, does have an effect [1].

Those are cpufreq not cpuidle and at least they don't use nr_iowait. The
original Changelog mentioned idle states, and I hate on menu for using
nr_iowait.

> io_uring (the only user of iowait but not iowait_acct) works around both.
>
> See commit ("8a796565cec3 io_uring: Use io_schedule* in cqring wait")
>
> [1]
> https://lore.kernel.org/lkml/20240304201625.100619-1-christian.loehle@xxxxxxx/#t

So while I agree with most of the short-commings listed in that set,
however that patch is quite terrifying.

I would prefer to start with something a *lot* simpler. How about a tick
driven decay of iops count per task. And that whole step array
*shudder*.