Re: [PATCH 0/6 v2] sched/eevdf: Improve scheduling latency of short slice task
From: Vincent Guittot
Date: Tue Jun 16 2026 - 09:59:05 EST
On Tue, 16 Jun 2026 at 09:44, K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Hello Vincent,
>
> On 6/15/2026 9:54 PM, Vincent Guittot wrote:
> > This series continues to improve the scheduling latency of tasks with
> > shorter slice duration by mainly canceling, updating or minimizing the
> > protection of the running tasks when appropriate.
> >
> > Benchmarks, like hackbench, haven't seen any noticeable performance
> > differences with this patchset (The default 2.8ms slice has been used for
> > testing performance regressions)
>
> I've left a few comments on the thread but for the vanilla runs, I too
> can confirm there aren't any without any slice tuning.
>
> Following are results from a dual socket 4th Generation EPYC system
> (2 x 128C/256T) with the series applied on top of
> "sched-core-2026-06-14":
>
> ==================================================================
> Test : hackbench
> Units : Normalized time in seconds
> Interpretation: Lower is better
> Statistic : AMean
> ==================================================================
> Case: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> 1-groups 1.00 [ -0.00]( 9.66) 0.86 [ 14.32](14.00)
> 2-groups 1.00 [ -0.00]( 9.22) 1.02 [ -1.78](10.45)
> 4-groups 1.00 [ -0.00]( 2.14) 0.98 [ 2.33]( 1.99)
> 8-groups 1.00 [ -0.00]( 2.80) 0.97 [ 2.88]( 2.93)
> 16-groups 1.00 [ -0.00]( 5.54) 1.00 [ 0.49]( 2.58)
>
>
> ==================================================================
> Test : tbench
> Units : Normalized throughput
> Interpretation: Higher is better
> Statistic : AMean
> ==================================================================
> Clients: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> 1 1.00 [ 0.00]( 0.03) 1.00 [ 0.36]( 0.20)
> 2 1.00 [ 0.00]( 0.32) 1.00 [ 0.09]( 0.14)
> 4 1.00 [ 0.00]( 0.34) 1.00 [ 0.39]( 0.28)
> 8 1.00 [ 0.00]( 0.24) 1.00 [ 0.01]( 0.24)
> 16 1.00 [ 0.00]( 0.45) 1.00 [ 0.01]( 0.47)
> 32 1.00 [ 0.00]( 0.58) 1.01 [ 0.75]( 0.30)
> 64 1.00 [ 0.00]( 0.81) 1.02 [ 1.62]( 0.62)
> 128 1.00 [ 0.00]( 0.53) 1.03 [ 3.24]( 0.25)
> 256 1.00 [ 0.00]( 0.30) 1.00 [ 0.39]( 0.26)
> 512 1.00 [ 0.00]( 3.73) 1.01 [ 1.47]( 1.13)
> 1024 1.00 [ 0.00]( 0.23) 1.00 [ -0.10]( 0.37)
> 2048 1.00 [ 0.00]( 0.14) 1.00 [ 0.29]( 0.19)
>
>
> ==================================================================
> Test : stream-10
> Units : Normalized Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic : HMean
> ==================================================================
> Test: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> Copy 1.00 [ 0.00]( 0.66) 0.99 [ -0.70]( 1.66)
> Scale 1.00 [ 0.00]( 0.89) 0.99 [ -0.77]( 1.52)
> Add 1.00 [ 0.00]( 0.73) 1.00 [ -0.34]( 1.31)
> Triad 1.00 [ 0.00]( 0.70) 0.99 [ -0.52]( 1.24)
>
>
> ==================================================================
> Test : stream-100
> Units : Normalized Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic : HMean
> ==================================================================
> Test: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> Copy 1.00 [ 0.00]( 0.32) 1.00 [ 0.07]( 0.36)
> Scale 1.00 [ 0.00]( 0.26) 1.00 [ -0.00]( 0.45)
> Add 1.00 [ 0.00]( 0.29) 1.00 [ -0.05]( 0.39)
> Triad 1.00 [ 0.00]( 0.27) 1.00 [ -0.05]( 0.37)
>
>
> ==================================================================
> Test : netperf
> Units : Normalized Througput
> Interpretation: Higher is better
> Statistic : AMean
> ==================================================================
> Clients: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> 1-clients 1.00 [ 0.00]( 0.10) 1.00 [ 0.02]( 0.19)
> 2-clients 1.00 [ 0.00]( 0.29) 1.00 [ 0.02]( 0.18)
> 4-clients 1.00 [ 0.00]( 0.36) 1.00 [ -0.01]( 0.23)
> 8-clients 1.00 [ 0.00]( 0.32) 1.00 [ 0.04]( 0.22)
> 16-clients 1.00 [ 0.00]( 0.24) 1.00 [ 0.09]( 0.22)
> 32-clients 1.00 [ 0.00]( 0.42) 1.00 [ 0.30]( 0.33)
> 64-clients 1.00 [ 0.00]( 0.94) 1.00 [ 0.48]( 0.67)
> 128-clients 1.00 [ 0.00]( 1.10) 1.01 [ 0.77]( 1.31)
> 256-clients 1.00 [ 0.00]( 1.06) 1.02 [ 2.02]( 1.18)
> 512-clients 1.00 [ 0.00]( 4.68) 0.99 [ -1.14]( 5.63)
> 768-clients 1.00 [ 0.00](34.35) 0.99 [ -1.00](34.84)
> 1024-clients 1.00 [ 0.00](42.76) 0.99 [ -0.81](45.77)
>
>
> ==================================================================
> Test : schbench
> Units : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> 1 1.00 [ -0.00](18.94) 0.39 [ 61.36]( 5.88)
> 2 1.00 [ -0.00]( 1.67) 0.91 [ 8.57]( 6.64)
> 4 1.00 [ -0.00]( 9.79) 0.84 [ 16.22]( 7.78)
> 8 1.00 [ -0.00]( 2.27) 0.89 [ 11.36](10.54)
> 16 1.00 [ -0.00]( 0.00) 0.98 [ 1.79]( 4.10)
> 32 1.00 [ -0.00]( 1.92) 1.01 [ -1.25]( 0.72)
> 64 1.00 [ -0.00]( 1.19) 1.01 [ -0.78]( 1.18)
> 128 1.00 [ -0.00]( 0.67) 1.01 [ -1.32]( 0.25)
> 256 1.00 [ -0.00]( 0.46) 1.03 [ -3.08]( 4.37)
> 512 1.00 [ -0.00]( 0.33) 1.01 [ -0.66]( 0.38)
> 768 1.00 [ -0.00]( 4.69) 1.02 [ -1.55](10.18)
> 1024 1.00 [ -0.00]( 2.71) 1.00 [ -0.00]( 4.43)
>
>
> ==================================================================
> Test : new-schbench-requests-per-second
> Units : Normalized Requests per second
> Interpretation: Higher is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> 1 1.00 [ 0.00]( 0.15) 0.99 [ -0.59]( 0.15)
> 2 1.00 [ 0.00]( 0.00) 0.99 [ -0.59]( 0.15)
> 4 1.00 [ 0.00]( 0.00) 0.99 [ -0.88]( 0.31)
> 8 1.00 [ 0.00]( 0.15) 1.00 [ 0.00]( 0.00)
> 16 1.00 [ 0.00]( 0.15) 1.00 [ 0.00]( 0.00)
> 32 1.00 [ 0.00]( 0.15) 1.00 [ 0.00]( 0.15)
> 64 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00)
> 128 1.00 [ 0.00](12.53) 0.98 [ -1.77](15.94)
> 256 1.00 [ 0.00]( 0.15) 0.99 [ -0.85]( 0.39)
> 512 1.00 [ 0.00]( 0.84) 1.00 [ 0.00]( 0.84)
> 768 1.00 [ 0.00]( 2.05) 0.99 [ -0.94]( 1.94)
> 1024 1.00 [ 0.00]( 2.90) 1.01 [ 1.35]( 2.18)
>
>
> ==================================================================
> Test : new-schbench-wakeup-latency
> Units : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> 1 1.00 [ -0.00](12.99) 1.47 [-46.67](27.72)
> 2 1.00 [ -0.00]( 4.08) 0.85 [ 15.38]( 0.00)
> 4 1.00 [ -0.00]( 0.00) 0.82 [ 18.18]( 0.00)
> 8 1.00 [ -0.00]( 0.00) 1.27 [-27.27]( 3.78)
> 16 1.00 [ -0.00]( 4.56) 1.27 [-27.27]( 0.00)
> 32 1.00 [ -0.00]( 0.00) 1.00 [ -0.00]( 4.56)
> 64 1.00 [ -0.00]( 5.00) 1.00 [ -0.00]( 5.00)
> 128 1.00 [ -0.00]( 7.45) 1.25 [-25.00](14.68)
> 256 1.00 [ -0.00]( 2.70) 0.96 [ 4.48]( 8.12)
> 512 1.00 [ -0.00]( 0.00) 1.00 [ -0.00]( 0.00)
> 768 1.00 [ -0.00]( 1.66) 1.01 [ -1.47]( 2.52)
> 1024 1.00 [ -0.00]( 3.32) 1.01 [ -0.59]( 0.66)
>
> Note: The absolute numbers are very small until 256 threads (~10-15us)
> which may causes a small variation to appear as a large regression.
>
> ==================================================================
> Test : new-schbench-request-latency
> Units : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) preempt_short_opt[pct imp](CV)
> 1 1.00 [ -0.00]( 0.14) 1.01 [ -1.06]( 0.27)
> 2 1.00 [ -0.00]( 0.14) 1.02 [ -1.60]( 0.23)
> 4 1.00 [ -0.00]( 0.00) 1.05 [ -4.53]( 1.73)
> 8 1.00 [ -0.00]( 0.14) 1.01 [ -0.53]( 0.14)
> 16 1.00 [ -0.00]( 1.49) 1.00 [ -0.26]( 1.23)
> 32 1.00 [ -0.00]( 0.89) 1.01 [ -0.79]( 0.00)
> 64 1.00 [ -0.00]( 1.43) 1.00 [ -0.26]( 0.98)
> 128 1.00 [ -0.00]( 2.78) 1.01 [ -1.18]( 4.09)
> 256 1.00 [ -0.00]( 0.13) 1.00 [ -0.25]( 0.26)
> 512 1.00 [ -0.00]( 6.72) 1.02 [ -2.20]( 5.45)
> 768 1.00 [ -0.00]( 3.42) 1.01 [ -0.52]( 4.21)
> 1024 1.00 [ -0.00]( 4.37) 1.01 [ -1.19]( 2.15)
>
> >
> > Several use cases has been used to testing the scheduling latency of short
> > slice tasks:
> > - cyclictest with a 3777us period and a 8ms slice alone
> > - cyclictest with a 3777us period and a 8ms slice. 2xNR_CPUS rt-app
> > tasks that run (8177us) and sleep (17777us) with a 16ms slice.
> > - cyclictest with a 3777us period and a 8ms slice. Hackbench with
> > 1 group using thread and pipe and a 16ms slice.
>
> I'll go check those configurations next and report back if I find
> anything out of the ordinary. Feel free to include:
>
> Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Thanks for the tests
>
> --
> Thanks and Regards,
> Prateek
>