Re: [PATCHSET v6 0/4] Split iowait into two states

From: Pavel Begunkov
Date: Tue Apr 01 2025 - 04:20:07 EST


On 3/31/25 11:33, Christian Loehle wrote:
On 3/31/25 10:02, Pavel Begunkov wrote:
On 8/19/24 16:39, Jens Axboe wrote:
Hi,

This is v6 of the patchset where the current in_iowait state is split
into two parts:

1) The "task is sleeping waiting on IO", and would like cpufreq goodness
    in terms of sleep and wakeup latencies.
2) The above, and also accounted as such in the iowait stats.

The current ->in_iowait covers both, this series splits it into two types
of state so that each can be controlled seperately.

Patches 1..3 are prep patches, changing the type of
task_struct->nr_iowait and adding helpers to manipulate the iowait counts.

Patch 4 does the actual splitting.

This has been sitting for a while, would be nice to get this queued up
for 6.12. Comments welcome!

Good day,

Did anything good happened with these patches or related work?
Christian>

Hi Pavel,
so for cpuidle part we've had commit ("38f83090f515 cpuidle: menu: Remove iowait influence")
for a while now without much complaints, hopefully that means it stays in.
So I'd really like to know how the results still compare for relevant workloads.

Sounds great

cpufreq iowait boosting is still a thing in schedutil and intel_pstate,
and so far I've failed to convince Rafael and Peter to get rid of it.
I still think that is the right thing to do, but it does come with a
regression in most of the simple synthetic fio tests.

IOW, from the io_uring iowait stat problem perspective it got stuck
and is unlikely to move short term.

Reminder: the goal is to let io_uring to keep using iowait boosting
but avoid reporting it in the iowait stats, because the jump in the
stat spooks users. I know at least several users carrying out of tree
patches to work it around. And, apparently, disabling the boosting
causes perf regressions.

Details would be appreciated, I looked the the postgres workload that
justified it initially and that was on cpuidle iowait which is no
longer a thing.

I wasn't involved and afraid don't have any extra numbers.

I'm reading through the thread, but unless I missed something, it looks
like the patchset is actually aligned with future plans on iowait
mentioned in the thread, in a sense that it reduces the exposure to
the user space, and, when it's time, a better approach will be able
replaces it with no visible effect to the user.

I'm not against $subject necessarily, it's clearly a hack tapering
over this but as I've mentioned I'm fine carrying a revert of $subject
for a future series on iowait boosting.


On the other hand, there seems to be a work around io_uring patch
queued for, which I quite dislike from io_uring perspective but also
because it exposes even more of iowait to the user.
I can understand why it's there, it has been over a year since v1,
but maybe we can figure something out before it's released? Would
it be fine to have something similar to this series? Any other
ideas?

Ah thank you, I've missed this
https://lore.kernel.org/io-uring/f548f142-d6f3-46d8-9c58-6cf595c968fb@xxxxxxxxx/
Would be nice if this lead to more numbers comparing the two at least.

Sure, but I'd rather avoid adding this type of a uapi just to test
it and solve the problem a different way after.

--
Pavel Begunkov