Re: [RESEND][PATCH v8 0/7] Preparatory changes for Proxy Execution v8

From: K Prateek Nayak
Date: Wed Feb 28 2024 - 12:37:58 EST


Hello John,

On 2/28/2024 10:54 AM, John Stultz wrote:
> On Tue, Feb 27, 2024 at 9:12 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>> On 2/28/2024 10:21 AM, John Stultz wrote:
>>> Just to clarify: by "this series" did you test just the 7 preparatory
>>> patches submitted to the list here, or did you pull the full
>>> proxy-exec-v8-6.8-rc3 set from git?
>>
>> Just these preparatory patches for now. On my way to queue a run for the
>> whole set from your tree. I'll use the "proxy-exec-v8-6.8-rc3" branch and
>> pick the commits past the
>> "[ANNOTATION] === Proxy Exec patches past this point ===" till the commit
>> ff90fb583a81 ("FIX: Avoid using possibly uninitialized cpu value with
>> activate_blocked_entities()") on top of the tip:sched/core mentioned
>> above since it'll allow me to reuse the baseline numbers :)
>>
>
> Ah, thank you for the clarification!
>
> Also, I really appreciate your testing with the rest of the series as
> well. It will be good to have any potential problems identified early

I got a chance to test the whole of v8 patches on the same dual socket
3rd Generation EPYC system:

tl;dr

- There is a slight regression in hackbench but instead of the 10x
blowup seen previously, it is only around 5% with overloaded case
not regressing at all.

- A small but consistent (~2-3%) regression is seen in tbench and
netperf.

- schbench is inconclusive due to run to run variance and stream is
perf neutral with proxy execution.

I've not looked deeper into the regressions. I'll let you know if I
spot anything when digging deeper. Below are the full results:

o System Details

- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode

o Kernels

tip: tip:sched/core at commit 8cec3dd9e593
("sched/core: Simplify code by removing
duplicate #ifdefs")

proxy-exec-full: tip + proxy execution commits from
"proxy-exec-v8-6.8-rc3" described previously in
this thread.

o Results

==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1-groups 1.00 [ -0.00]( 2.08) 1.00 [ -0.18]( 3.90)
2-groups 1.00 [ -0.00]( 0.89) 1.04 [ -4.43]( 0.78)
4-groups 1.00 [ -0.00]( 0.81) 1.05 [ -4.82]( 1.03)
8-groups 1.00 [ -0.00]( 0.78) 1.02 [ -1.90]( 1.00)
16-groups 1.00 [ -0.00]( 1.60) 1.01 [ -0.80]( 1.18)


==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1 1.00 [ 0.00]( 0.71) 0.97 [ -3.00]( 0.15)
2 1.00 [ 0.00]( 0.25) 0.97 [ -3.35]( 0.98)
4 1.00 [ 0.00]( 0.85) 0.97 [ -3.26]( 1.40)
8 1.00 [ 0.00]( 1.00) 0.97 [ -2.75]( 0.46)
16 1.00 [ 0.00]( 1.25) 0.99 [ -1.27]( 0.11)
32 1.00 [ 0.00]( 0.35) 0.98 [ -2.42]( 0.06)
64 1.00 [ 0.00]( 0.71) 0.97 [ -2.76]( 1.81)
128 1.00 [ 0.00]( 0.46) 0.97 [ -2.67]( 0.88)
256 1.00 [ 0.00]( 0.24) 0.98 [ -1.97]( 0.98)
512 1.00 [ 0.00]( 0.30) 0.98 [ -2.41]( 0.38)
1024 1.00 [ 0.00]( 0.40) 0.98 [ -2.21]( 0.11)


==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
Copy 1.00 [ 0.00]( 9.73) 1.00 [ 0.26]( 6.36)
Scale 1.00 [ 0.00]( 5.57) 1.02 [ 1.59]( 2.98)
Add 1.00 [ 0.00]( 5.43) 1.00 [ 0.48]( 2.77)
Triad 1.00 [ 0.00]( 5.50) 0.98 [ -2.18]( 6.06)


==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
Copy 1.00 [ 0.00]( 3.26) 0.98 [ -1.96]( 3.24)
Scale 1.00 [ 0.00]( 1.26) 0.96 [ -3.61]( 6.41)
Add 1.00 [ 0.00]( 1.47) 0.98 [ -1.84]( 4.14)
Triad 1.00 [ 0.00]( 1.77) 1.00 [ 0.27]( 2.60)


==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.22) 0.97 [ -3.01]( 0.40)
2-clients 1.00 [ 0.00]( 0.57) 0.97 [ -3.25]( 0.45)
4-clients 1.00 [ 0.00]( 0.43) 0.97 [ -3.26]( 0.59)
8-clients 1.00 [ 0.00]( 0.27) 0.97 [ -2.83]( 0.55)
16-clients 1.00 [ 0.00]( 0.46) 0.97 [ -2.99]( 0.65)
32-clients 1.00 [ 0.00]( 0.95) 0.97 [ -2.98]( 0.71)
64-clients 1.00 [ 0.00]( 1.79) 0.97 [ -2.61]( 1.38)
128-clients 1.00 [ 0.00]( 0.89) 0.97 [ -2.72]( 0.94)
256-clients 1.00 [ 0.00]( 3.88) 0.98 [ -1.89]( 2.92)
512-clients 1.00 [ 0.00](35.06) 0.99 [ -0.78](47.83)


==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1 1.00 [ -0.00](27.28) 1.31 [-31.25]( 6.45)
2 1.00 [ -0.00]( 3.85) 0.95 [ 5.00](10.02)
4 1.00 [ -0.00](14.00) 1.11 [-10.53]( 1.36)
8 1.00 [ -0.00]( 4.68) 1.15 [-14.58](14.55)
16 1.00 [ -0.00]( 4.08) 0.98 [ 1.61]( 3.28)
32 1.00 [ -0.00]( 6.68) 1.02 [ -2.04]( 1.71)
64 1.00 [ -0.00]( 1.79) 1.12 [-11.73]( 7.08)
128 1.00 [ -0.00]( 6.30) 1.11 [-10.84]( 5.52)
256 1.00 [ -0.00](43.39) 1.37 [-37.14](20.11)
512 1.00 [ -0.00]( 2.26) 0.99 [ 1.17]( 1.43)


==================================================================
Test : Unixbench
Units : Normalized scores
Interpretation: Lower is better
Statistic : Various (Mentioned)
==================================================================
Metric Variant tip proxy-exec-full
Hmean unixbench-dhry2reg-1 0.00% -0.67%
Hmean unixbench-dhry2reg-512 0.00% 0.14%
Amean unixbench-syscall-1 0.00% -0.86%
Amean unixbench-syscall-512 0.00% -6.42%
Hmean unixbench-pipe-1 0.00% 0.79%
Hmean unixbench-pipe-512 0.00% 0.57%
Hmean unixbench-spawn-1 0.00% -3.91%
Hmean unixbench-spawn-512 0.00% 3.17%
Hmean unixbench-execl-1 0.00% -1.18%
Hmean unixbench-execl-512 0.00% 1.26%
--

> (I'm trying to get v9 ready as soon as I can here, as its fixed a
> number of smaller issues - However, I've also managed to uncover some
> new problems in stress testing, so we'll see how quickly I can chase
> those down).

I haven't seen any splats when running the above tests. I'll test some
larger workloads next. Please let me know if you would like me to test
any specific workload or need additional data from these tests :)

>
> thanks
> -john

--
Thanks and Regards,
Prateek