[RFC] Scheduler: DMA Engine regression because of sched/fair changes

From: Alexander Fomichev
Date: Wed Jan 12 2022 - 10:26:23 EST


CC: Mel Gorman <mgorman@xxxxxxx>
CC: linux@xxxxxxxxx

Hi all,

There's a huge regression found, which affects Intel Xeon's DMA Engine
performance between v4.14 LTS and modern kernels. In certain
circumstances the speed in dmatest is more than 6 times lower.

- Hardware -
I did testing on 2 systems:
1) Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz (Supermicro X11DAi-N)
2) Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (YADRO Vegman S220)

- Measurement -
The dmatest result speed decreases with almost any test settings.
Although the most significant impact is revealed with 64K transfers. The
following parameters were used:

modprobe dmatest iterations=1000 timeout=2000 test_buf_size=0x100000 transfer_size=0x10000 norandom=1
echo "dma0chan0" > /sys/module/dmatest/parameters/channel
echo 1 > /sys/module/dmatest/parameters/run

Every test csse was performed at least 3 times. All detailed results are
below.

- Analysis -
Bisecting revealed 2 different bad commits for those 2 systems, but both
change the same function/condition in the same file.
For the system (1) the bad commit is:
[7332dec055f2457c386032f7e9b2991eb05c2a0a] sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache
For the system (2) the bad commit is:
[806486c377e33ab662de6d47902e9e2a32b79368] sched/fair: Do not migrate if the prev_cpu is idle

- Additional check -
Attempting to revert the changes above, a dirty patch for the (current)
kernel v5.16.0-rc5 was tested too:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6f16dfb74246..0a58cc00b1b8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5931,8 +5931,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
* a cpufreq perspective, it's better to have higher utilisation
* on one CPU.
*/
- if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
- return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
+ if (available_idle_cpu(this_cpu))
+ return this_cpu;

if (sync && cpu_rq(this_cpu)->nr_running == 1)
return this_cpu;

Please, take a look if this makes sense. But with this patch applied the
performance of DMA Engine restores.

- Dmatest results TL;DR -

System (1) before bad commit:
---------------------
[ 519.894642] dmatest: Added 1 threads using dma0chan0
[ 525.383021] dmatest: Started 1 threads using dma0chan0
[ 528.521915] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 98367.10 iops 6295494 KB/s (0)
[ 544.851751] dmatest: Added 1 threads using dma0chan0
[ 546.460064] dmatest: Started 1 threads using dma0chan0
[ 549.609504] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 100310.96 iops 6419901 KB/s (0)
[ 562.178365] dmatest: Added 1 threads using dma0chan0
[ 563.852534] dmatest: Started 1 threads using dma0chan0
[ 567.004898] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 98580.44 iops 6309148 KB/s (0)
---------------------

System (1) on HEAD=bad commit:
---------------------
[ 149.555401] dmatest: Added 1 threads using dma0chan0
[ 154.162444] dmatest: Started 1 threads using dma0chan0
[ 157.490868] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 26653.87 iops 1705847 KB/s (0)
[ 176.783450] dmatest: Added 1 threads using dma0chan0
[ 178.428518] dmatest: Started 1 threads using dma0chan0
[ 181.606531] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 14194.86 iops 908471 KB/s (0)
[ 192.125218] dmatest: Added 1 threads using dma0chan0
[ 194.060029] dmatest: Started 1 threads using dma0chan0
[ 197.235265] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 14757.09 iops 944454 KB/s (0)
---------------------

Systen (1) on v5.16.0-rc5:
---------------------
[ 1430.860170] dmatest: Added 1 threads using dma0chan0
[ 1437.367447] dmatest: Started 1 threads using dma0chan0
[ 1442.756660] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 24837.31 iops 1589588 KB/s (0)
[ 1561.614191] dmatest: Added 1 threads using dma0chan0
[ 1562.816375] dmatest: Started 1 threads using dma0chan0
[ 1566.619614] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 13666.05 iops 874627 KB/s (0)
[ 1585.019601] dmatest: Added 1 threads using dma0chan0
[ 1587.585741] dmatest: Started 1 threads using dma0chan0
[ 1591.386816] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 13521.91 iops 865402 KB/s (0)
---------------------

System (1) on v5.16.0-rc5 with dirty patch:
---------------------
[ 733.571508] dmatest: Added 1 threads using dma0chan0
[ 746.050800] dmatest: Started 1 threads using dma0chan0
[ 749.765600] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 87260.03 iops 5584642 KB/s (0)
[ 915.051955] dmatest: Added 1 threads using dma0chan0
[ 916.550732] dmatest: Started 1 threads using dma0chan0
[ 920.267525] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 88464.25 iops 5661712 KB/s (0)
[ 936.781273] dmatest: Added 1 threads using dma0chan0
[ 939.528616] dmatest: Started 1 threads using dma0chan0
[ 943.247694] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 88833.61 iops 5685351 KB/s (0)
---------------------

System (2) before bad commit:
---------------------
[ 481.309411] dmatest: Added 1 threads using dma0chan0
[ 491.197425] dmatest: Started 1 threads using dma0chan0
[ 497.047315] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 78988.94 iops 5055292 KB/s (0)
[ 506.057101] dmatest: Added 1 threads using dma0chan0
[ 508.939426] dmatest: Started 1 threads using dma0chan0
[ 514.788823] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 77754.44 iops 4976284 KB/s (0)
[ 531.894587] dmatest: Added 1 threads using dma0chan0
[ 534.053360] dmatest: Started 1 threads using dma0chan0
[ 539.906424] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 76988.21 iops 4927246 KB/s (0)
---------------------

System (2) on HEAD=bad commit:
---------------------
[44522.892995] dmatest: Added 1 threads using dma0chan0
[44526.193331] dmatest: Started 1 threads using dma0chan0
[44532.043932] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 80360.01 iops 5143040 KB/s (0)
[44561.121118] dmatest: Added 1 threads using dma0chan0
[44562.868428] dmatest: Started 1 threads using dma0chan0
[44568.808577] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 16080.53 iops 1029154 KB/s (0)
[44728.597409] dmatest: Added 1 threads using dma0chan0
[44730.301566] dmatest: Started 1 threads using dma0chan0
[44736.259009] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 16091.91 iops 1029882 KB/s (0)
---------------------

Thanks for reading.

--
Regards,
Alexander Fomichev