Re: [RFC] Scheduler: DMA Engine regression because of sched/fair changes

From: Alexander Fomichev
Date: Fri Jan 28 2022 - 11:51:12 EST


On Sun, Jan 23, 2022 at 07:33:14AM +0800, Hillf Danton wrote:
>
> Lets put pieces together based on the data collected.
>
> 1, No irq migration was observed.
>
> 2, Your patch that produced the highest iops fo far
>
> -----< 5.15.8-ioat-ptdma-dirty-fix+ >-----
> [ 6183.356549] dmatest: Added 1 threads using dma0chan0
> [ 6187.868237] dmatest: Started 1 threads using dma0chan0
> [ 6187.887389] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 52753.74 iops 3376239 KB/s (0)
> [ 6201.913154] dmatest: Added 1 threads using dma0chan0
> [ 6204.701340] dmatest: Started 1 threads using dma0chan0
> [ 6204.720490] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 52614.96 iops 3367357 KB/s (0)
> [ 6285.114603] dmatest: Added 1 threads using dma0chan0
> [ 6287.031875] dmatest: Started 1 threads using dma0chan0
> [ 6287.050278] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 54939.01 iops 3516097 KB/s (0)
> -----< end >-----
>
>
> - if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
> - return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
> + if (available_idle_cpu(this_cpu))
> + return this_cpu;
>
> prefers this cpu if it is idle regardless of cache affinity.
>
> This implies task migration to the cpu that handled irq.
>
> 3, Given this cpu is different from the prev cpu in addition to no irq
> migration, the tested task was bouncing between the two cpus, with one
> more question rising, why was task migrated off from the irq-handling cpu?
>
> Despite no evidence, I bet no bounce occurred given iops observed.
>

IMHO, your assumptions are correct.
I've added CPU number counters on every step of dmatest. It reveals that
test task migration between (at least 2) CPUs occurs in even one thread
mode. Below is for vanilla 5.15.8 kernel:

-----< threads_per_chan=1 >-----
[19449.557950] dmatest: Added 1 threads using dma0chan0
[19469.238180] dmatest: Started 1 threads using dma0chan0
[19469.253962] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 65291.19 iops 4178636 KB/s (0)
[19469.253973] dmatest: CPUs hit: #52:417 #63:583 (times)
-----< end >-----

Note that IRQ handler runs on CPU #52 in this environment.

In 4 thread mode the task migrates even more aggressively:

-----< threads_per_chan=4 >-----
[19355.460227] dmatest: Added 4 threads using dma0chan0
[19359.841182] dmatest: Started 4 threads using dma0chan0
[19359.860447] dmatest: dma0chan0-copy3: summary 1000 tests, 0 failures 53908.35 iops 3450134 KB/s (0)
[19359.860451] dmatest: dma0chan0-copy2: summary 1000 tests, 0 failures 54179.98 iops 3467519 KB/s (0)
[19359.860456] dmatest: CPUs hit: #50:1000 (times)
[19359.860459] dmatest: CPUs hit: #17:1000 (times)
[19359.860459] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 54048.21 iops 3459085 KB/s (0)
[19359.860466] dmatest: CPUs hit: #31:1000 (times)
[19359.861420] dmatest: dma0chan0-copy1: summary 1000 tests, 0 failures 51466.80 iops 3293875 KB/s (0)
[19359.861425] dmatest: CPUs hit: #30:213 #48:556 #52:231 (times)
-----< end >-----

On the other hand, for dirty-patched kernel task doesn't migrate:

-----< patched threads_per_chan=1 >-----
[ 2100.142002] dmatest: Added 1 threads using dma0chan0
[ 2102.359727] dmatest: Started 1 threads using dma0chan0
[ 2102.373594] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 76173.06 iops 4875076 KB/s (0)
[ 2102.373600] dmatest: CPUs hit: #49:1000 (times)
-----< end >-----

IRQ handler runs on CPU #49 in this case.

Although in 4 thread mode the task still migrates. I think we should
consider such scenario as non-relevant for this isue.

--
Regards,
Alexander