Re: [PATCHSET v2 0/2] Split iowait into two states

From: Jens Axboe
Date: Tue Feb 27 2024 - 21:21:48 EST


On 2/27/24 2:06 PM, Jens Axboe wrote:
> I haven't been able to properly benchmark patch 1, as the atomics are
> noise in any workloads that approximate normality. I can certainly
> concoct a synthetic test case if folks are interested. My gut says that
> we're trading 3 fast path atomics for none, and with the 4th case
> _probably_ being way less likely. There we grab the rq lock.

OK, so on Chris's suggestion, I tried his schbench to exercise the
scheduling side. It's very futex intensive, so I hacked up futex to set
iowait state when sleeping. I also added simple accounting to that path
so I knew how many times it ran. A run of:

/schbench -m 60 -t 10 -p 8

on a 2 socket Intel(R) Xeon(R) Platinum 8458P with 176 threads, there's
no regression in performance and try_to_wake_up() locking the rq of the
task being scheduled in from another CPU doesn't seem to register much.
On the previous run, I saw 2.21% there and now it's 2.36%. But it was
also a better performing run, which may have lead to the increase.

Each run takes 30 seconds, and during that time I see around 290-310M
hits of that path, or about ~10M/sec. Without modifying futex to use
iowait, we obviously rarely hit it. About 200 times for a run, which
makes sense as we're not really doing IO.

Anyway, just some data on this. If I leave the futex/pipe iowait in and
run the same test, I see no discernable difference in profiles. In fact,
the highest cost across the tests is bringing in the task->in_iowait
cacheline.

--
Jens Axboe