Re: [PATCH 21/24] sched/eevdf: Allow shorter slices to wakeup-preempt

From: Chen Yu
Date: Thu Aug 08 2024 - 06:16:29 EST


Hi Peter,

On 2024-07-27 at 12:27:53 +0200, Peter Zijlstra wrote:
> Part of the reason to have shorter slices is to improve
> responsiveness. Allow shorter slices to preempt longer slices on
> wakeup.
>
> Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Sum delay ms |
>
> 100ms massive_intr 500us cyclictest NO_PREEMPT_SHORT
>
> 1 massive_intr:(5) | 846018.956 ms | 779188 | avg: 0.273 ms | max: 58.337 ms | sum:212545.245 ms |
> 2 massive_intr:(5) | 853450.693 ms | 792269 | avg: 0.275 ms | max: 71.193 ms | sum:218263.588 ms |
> 3 massive_intr:(5) | 843888.920 ms | 771456 | avg: 0.277 ms | max: 92.405 ms | sum:213353.221 ms |
> 1 chromium-browse:(8) | 53015.889 ms | 131766 | avg: 0.463 ms | max: 36.341 ms | sum:60959.230 ms |
> 2 chromium-browse:(8) | 53864.088 ms | 136962 | avg: 0.480 ms | max: 27.091 ms | sum:65687.681 ms |
> 3 chromium-browse:(9) | 53637.904 ms | 132637 | avg: 0.481 ms | max: 24.756 ms | sum:63781.673 ms |
> 1 cyclictest:(5) | 12615.604 ms | 639689 | avg: 0.471 ms | max: 32.272 ms | sum:301351.094 ms |
> 2 cyclictest:(5) | 12511.583 ms | 642578 | avg: 0.448 ms | max: 44.243 ms | sum:287632.830 ms |
> 3 cyclictest:(5) | 12545.867 ms | 635953 | avg: 0.475 ms | max: 25.530 ms | sum:302374.658 ms |
>
> 100ms massive_intr 500us cyclictest PREEMPT_SHORT
>
> 1 massive_intr:(5) | 839843.919 ms | 837384 | avg: 0.264 ms | max: 74.366 ms | sum:221476.885 ms |
> 2 massive_intr:(5) | 852449.913 ms | 845086 | avg: 0.252 ms | max: 68.162 ms | sum:212595.968 ms |
> 3 massive_intr:(5) | 839180.725 ms | 836883 | avg: 0.266 ms | max: 69.742 ms | sum:222812.038 ms |
> 1 chromium-browse:(11) | 54591.481 ms | 138388 | avg: 0.458 ms | max: 35.427 ms | sum:63401.508 ms |
> 2 chromium-browse:(8) | 52034.541 ms | 132276 | avg: 0.436 ms | max: 31.826 ms | sum:57732.958 ms |
> 3 chromium-browse:(8) | 55231.771 ms | 141892 | avg: 0.469 ms | max: 27.607 ms | sum:66538.697 ms |
> 1 cyclictest:(5) | 13156.391 ms | 667412 | avg: 0.373 ms | max: 38.247 ms | sum:249174.502 ms |
> 2 cyclictest:(5) | 12688.939 ms | 665144 | avg: 0.374 ms | max: 33.548 ms | sum:248509.392 ms |
> 3 cyclictest:(5) | 13475.623 ms | 669110 | avg: 0.370 ms | max: 37.819 ms | sum:247673.390 ms |
>
> As per the numbers the, this makes cyclictest (short slice) it's
> max-delay more consistent and consistency drops the sum-delay. The
> trade-off is that the massive_intr (long slice) gets more context
> switches and a slight increase in sum-delay.
>
> [mike: numbers]
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Tested-by: Mike Galbraith <umgwanakikbuti@xxxxxxxxx>

Besides this short preemption, it seems that an important patch is missing from
this patch set, that was originally from Prateek and you refined it to fix the
current task's false negative eligibility:
https://lore.kernel.org/lkml/20240424150721.GQ30852@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

The RESPECT_SLICE is introduced to honor the current task's slice during wakeup preemption.
Without it we got reported that over-preemption and performance downgrading are observed
when running SPECjbb on servers.

echo RESPECT_SLICE > /sys/kernel/debug/sched/features

echo RUN_TO_PARITY > /sys/kernel/debug/sched/features
@task_duration_usecs_before_preempted:
[2, 4) 8732 |@@@ |
[4, 8) 109400 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 95815 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[16, 32) 110647 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[32, 64) 131298 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 132566 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[128, 256) 82095 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[256, 512) 33771 |@@@@@@@@@@@@@ |
[512, 1K) 24180 |@@@@@@@@@ |
[1K, 2K) 31056 |@@@@@@@@@@@@ |
[2K, 4K) 117533 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4K, 8K) 4472 |@ |
[8K, 16K) 1149 | |
[16K, 32K) 289 | |
[32K, 64K) 110 | |
[64K, 128K) 20 | |
[128K, 256K) 4 | |


echo NO_RUN_TO_PARITY > /sys/kernel/debug/sched/features
@task_duration_usecs_before_preempted:
[4, 8) 1 | |
[8, 16) 12 | |
[16, 32) 20 | |
[32, 64) 38 | |
[64, 128) 64 | |
[128, 256) 98 | |
[256, 512) 248 | |
[512, 1K) 1196 | |
[1K, 2K) 3456 | |
[2K, 4K) 417269 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K) 22893 |@@ |
[8K, 16K) 7818 | |
[16K, 32K) 1471 | |
[32K, 64K) 373 | |
[64K, 128K) 96 | |
[128K, 256K) 3 | |

We can see that without the fix, the task will be preempted and can not reach
its time slice budget(we enlarge its slice).

May I know if we can put that patch into this series please?

thanks,
Chenyu