Re: [PATCH 2/2] sched/core: split iowait state into two states

From: Thomas Gleixner
Date: Thu Feb 29 2024 - 12:31:18 EST


On Wed, Feb 28 2024 at 12:16, Jens Axboe wrote:
> iowait is a bogus metric, but it's helpful in the sense that it allows
> short waits to not enter sleep states that have a higher exit latency
> than we would've picked for iowait'ing tasks. However, it's harmless in
> that lots of applications and monitoring assumes that iowait is busy
> time, or otherwise use it as a health metric. Particularly for async
> IO it's entirely nonsensical.
>
> Split the iowait part into two parts - one that tracks whether we need
> boosting for short waits, and one that says we need to account the
> task

We :)

> as such. ->in_iowait_acct nests inside of ->in_iowait, both for
> efficiency reasons, but also so that the relationship between the two
> is clear. A waiter may set ->in_wait alone and not care about the
> accounting.

> +/*
> + * Returns a token which is comprised of the two bits of iowait wait state -
> + * one is whether we're making ourselves as in iowait for cpufreq reasons,
> + * and the other is if the task should be accounted as such.
> + */
> int io_schedule_prepare(void)
> {
> - int old_iowait = current->in_iowait;
> + int old_wait_flags = 0;
> +
> + if (current->in_iowait)
> + old_wait_flags |= TASK_IOWAIT;
> + if (current->in_iowait_acct)
> + old_wait_flags |= TASK_IOWAIT_ACCT;
>
> current->in_iowait = 1;
> + current->in_iowait_acct = 1;
> blk_flush_plug(current->plug, true);
> - return old_iowait;
> + return old_wait_flags;
> }
>
> -void io_schedule_finish(int token)
> +void io_schedule_finish(int old_wait_flags)
> {
> - current->in_iowait = token;
> + if (!(old_wait_flags & TASK_IOWAIT))
> + current->in_iowait = 0;
> + if (!(old_wait_flags & TASK_IOWAIT_ACCT))
> + current->in_iowait_acct = 0;

Why? TASK_IOWAIT_ACCT requires TASK_IOWAIT, right? So if TASK_IOWAIT was
not set then TASK_IOWAIT_ACCT must have been clear too, no?

Thanks,

tglx