Re: [RFC PATCH 1/1] sched/eevdf: Skip eligibility check for current entity during wakeup preemption
From: K Prateek Nayak
Date: Wed Apr 17 2024 - 02:08:45 EST
Hello Youssef,
On 3/26/2024 8:36 AM, K Prateek Nayak wrote:
>> [..snip..]
>>
>> Thanks for sharing this Prateek.
>> We actually noticed we could also gain performance by disabling
>> eligibility checks (but disable it on all paths).
>> The following are a few threads we had on the topic:
>>
>> Discussion around eligibility:
>> https://lore.kernel.org/lkml/CA+q576MS0-MV1Oy-eecvmYpvNT3tqxD8syzrpxQ-Zk310hvRbw@xxxxxxxxxxxxxx/
>> Some of our results:
>> https://lore.kernel.org/lkml/CA+q576Mov1jpdfZhPBoy_hiVh3xSWuJjXdP3nS4zfpqfOXtq7Q@xxxxxxxxxxxxxx/
>> Sched feature to disable eligibility:
>> https://lore.kernel.org/lkml/20231013030213.2472697-1-youssefesmat@xxxxxxxxxxxx/
>>
>
> Thank you for pointing me to the discussions. I'll give this a spin on
> my machine and report back what I see. Hope some of it will help during
> the OSPM discussion :)
Sorry about the delay but on a positive note, I do not see any
concerning regressions after dropping the eligibility criteria. I'll
leave the full results from my testing below.
o System Details
- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode
o Kernels
tip: tip:sched/core at commit 4475cd8bfd9b
("sched/balancing: Simplify the sg_status
bitmask and use separate ->overloaded and
->overutilized flags")
eie: (everyone is eligible)
tip + vruntime_eligible() and entity_eligible()
always returns true.
o Results
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) eie[pct imp](CV)
1-groups 1.00 [ -0.00]( 1.94) 0.95 [ 5.11]( 2.56)
2-groups 1.00 [ -0.00]( 2.41) 0.97 [ 2.80]( 1.52)
4-groups 1.00 [ -0.00]( 1.16) 0.95 [ 5.01]( 1.04)
8-groups 1.00 [ -0.00]( 1.72) 0.96 [ 4.37]( 1.01)
16-groups 1.00 [ -0.00]( 2.16) 0.94 [ 5.88]( 2.30)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) eie[pct imp](CV)
1 1.00 [ 0.00]( 0.69) 1.00 [ 0.05]( 0.61)
2 1.00 [ 0.00]( 0.25) 1.00 [ 0.06]( 0.51)
4 1.00 [ 0.00]( 1.04) 0.98 [ -1.69]( 1.21)
8 1.00 [ 0.00]( 0.72) 1.00 [ -0.13]( 0.56)
16 1.00 [ 0.00]( 2.40) 1.00 [ 0.43]( 0.63)
32 1.00 [ 0.00]( 0.62) 0.98 [ -1.80]( 2.18)
64 1.00 [ 0.00]( 1.19) 0.98 [ -2.13]( 1.26)
128 1.00 [ 0.00]( 0.91) 1.00 [ 0.37]( 0.50)
256 1.00 [ 0.00]( 0.52) 1.00 [ -0.11]( 0.21)
512 1.00 [ 0.00]( 0.36) 1.02 [ 1.54]( 0.58)
1024 1.00 [ 0.00]( 0.26) 1.01 [ 1.21]( 0.41)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) eie[pct imp](CV)
Copy 1.00 [ 0.00]( 5.01) 1.01 [ 1.27]( 4.63)
Scale 1.00 [ 0.00]( 6.93) 1.03 [ 2.66]( 5.20)
Add 1.00 [ 0.00]( 5.94) 1.03 [ 3.41]( 4.99)
Triad 1.00 [ 0.00]( 6.40) 0.95 [ -4.69]( 8.29)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) eie[pct imp](CV)
Copy 1.00 [ 0.00]( 2.84) 1.00 [ -0.37]( 2.44)
Scale 1.00 [ 0.00]( 5.26) 1.00 [ 0.21]( 3.88)
Add 1.00 [ 0.00]( 4.98) 1.00 [ 0.11]( 1.15)
Triad 1.00 [ 0.00]( 1.60) 0.96 [ -3.72]( 5.26)
==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) eie[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.90) 1.00 [ -0.09]( 0.16)
2-clients 1.00 [ 0.00]( 0.77) 0.99 [ -0.89]( 0.97)
4-clients 1.00 [ 0.00]( 0.63) 0.99 [ -1.03]( 1.53)
8-clients 1.00 [ 0.00]( 0.52) 0.99 [ -0.86]( 1.66)
16-clients 1.00 [ 0.00]( 0.43) 0.99 [ -0.91]( 0.79)
32-clients 1.00 [ 0.00]( 0.88) 0.98 [ -2.37]( 1.42)
64-clients 1.00 [ 0.00]( 1.63) 0.96 [ -4.07]( 0.91) *
128-clients 1.00 [ 0.00]( 0.94) 1.00 [ -0.30]( 0.94)
256-clients 1.00 [ 0.00]( 5.08) 0.95 [ -4.95]( 3.36)
512-clients 1.00 [ 0.00](51.89) 0.99 [ -0.93](51.00)
* This seems to be the only point of regression with low CV. I'll
rerun this and report back if I see a consistent dip but for the
time being I'm not worried.
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) eie[pct imp](CV)
1 1.00 [ -0.00](30.01) 0.97 [ 3.12](14.32)
2 1.00 [ -0.00](26.14) 1.23 [-22.58](13.48)
4 1.00 [ -0.00](13.22) 1.00 [ -0.00]( 6.04)
8 1.00 [ -0.00]( 6.23) 1.00 [ -0.00](13.09)
16 1.00 [ -0.00]( 3.49) 1.02 [ -1.69]( 3.43)
32 1.00 [ -0.00]( 2.20) 0.98 [ 2.13]( 2.47)
64 1.00 [ -0.00]( 7.17) 0.88 [ 12.50]( 3.18)
128 1.00 [ -0.00]( 2.79) 1.02 [ -2.46]( 8.29)
256 1.00 [ -0.00](13.02) 1.01 [ -1.34](37.58)
512 1.00 [ -0.00]( 4.27) 0.79 [ 21.49]( 2.41)
==================================================================
Test : DeathStarBench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
Pinning scaling tip eie (pct imp)
1CCD 1 1.00 1.15 (%diff: 15.68%)
2CCD 2 1.00 0.99 (%diff: -1.12%)
4CCD 4 1.00 1.11 (%diff: 11.65%)
8CCD 8 1.00 1.05 (%diff: 4.98%)
--
>
> [..snip..]
>
I'll try to get data from more workloads, will update the thread with
when it arrives.
--
Thanks and Regards,
Prateek