Re: [PATCH v2] sched/fair: Preserve PLACE_DEADLINE_INITIAL deadline

From: K Prateek Nayak
Date: Thu Oct 12 2023 - 00:48:37 EST


Hello Daniel,

Same as v1, I do not see any regressions with this version either.
I'll leave the full results below.

o Machine details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)

o Kernel Details

- tip: tip:sched/core at commit 238437d88cea ("intel_idle: Add ibrs_off
module parameter to force-disable IBRS")
[For DeathStarBench comparisons alone since I ran to the issue
which below commit solves]
+ min_deadline fix commit 8dafa9d0eb1a ("sched/eevdf: Fix
min_deadline heap integrity") from tip:sched/urgent

- place-initial-fix: tip + this patch as is

o Benchmark Results

==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) place-initial-fix[pct imp](CV)
1-groups 1.00 [ -0.00]( 2.11) 1.01 [ -1.08]( 2.60)
2-groups 1.00 [ -0.00]( 1.31) 1.01 [ -0.93]( 1.61)
4-groups 1.00 [ -0.00]( 1.04) 1.00 [ -0.00]( 1.25)
8-groups 1.00 [ -0.00]( 1.34) 0.99 [ 1.15]( 0.85)
16-groups 1.00 [ -0.00]( 2.45) 1.00 [ -0.27]( 2.32)


==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) place-initial-fix[pct imp](CV)
1 1.00 [ 0.00]( 0.46) 0.99 [ -0.59]( 0.88)
2 1.00 [ 0.00]( 0.64) 0.99 [ -1.43]( 0.69)
4 1.00 [ 0.00]( 0.59) 0.99 [ -1.49]( 0.76)
8 1.00 [ 0.00]( 0.34) 1.00 [ -0.35]( 0.20)
16 1.00 [ 0.00]( 0.72) 0.98 [ -1.96]( 1.97)
32 1.00 [ 0.00]( 0.65) 1.00 [ -0.24]( 1.07)
64 1.00 [ 0.00]( 0.59) 1.00 [ -0.14]( 1.18)
128 1.00 [ 0.00]( 1.19) 0.99 [ -1.04]( 0.93)
256 1.00 [ 0.00]( 0.16) 1.00 [ -0.18]( 0.34)
512 1.00 [ 0.00]( 0.20) 0.99 [ -0.62]( 0.02)
1024 1.00 [ 0.00]( 0.06) 1.00 [ -0.49]( 0.37)


==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) place-initial-fix[pct imp](CV)
Copy 1.00 [ 0.00]( 6.04) 1.00 [ -0.21]( 7.98)
Scale 1.00 [ 0.00]( 5.44) 0.99 [ -0.75]( 5.75)
Add 1.00 [ 0.00]( 5.44) 0.99 [ -1.48]( 5.40)
Triad 1.00 [ 0.00]( 7.82) 1.02 [ 2.21]( 8.33)


==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) place-initial-fix[pct imp](CV)
Copy 1.00 [ 0.00]( 1.14) 1.00 [ 0.40]( 1.12)
Scale 1.00 [ 0.00]( 4.60) 1.01 [ 1.05]( 4.99)
Add 1.00 [ 0.00]( 4.91) 1.00 [ -0.14]( 4.97)
Triad 1.00 [ 0.00]( 0.60) 0.96 [ -3.53]( 6.13)


==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) place-initial-fix[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.61) 1.00 [ 0.40]( 0.75)
2-clients 1.00 [ 0.00]( 0.44) 1.00 [ -0.47]( 0.91)
4-clients 1.00 [ 0.00]( 0.75) 1.00 [ -0.23]( 0.84)
8-clients 1.00 [ 0.00]( 0.65) 1.00 [ -0.07]( 0.62)
16-clients 1.00 [ 0.00]( 0.49) 1.00 [ -0.29]( 0.56)
32-clients 1.00 [ 0.00]( 0.57) 1.00 [ -0.14]( 0.46)
64-clients 1.00 [ 0.00]( 1.67) 1.00 [ -0.14]( 1.81)
128-clients 1.00 [ 0.00]( 1.11) 1.01 [ 0.64]( 1.04)
256-clients 1.00 [ 0.00]( 2.64) 0.99 [ -1.29]( 5.25)
512-clients 1.00 [ 0.00](52.49) 0.99 [ -0.57](53.01)


==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) place-initial-fix[pct imp](CV)
1 1.00 [ -0.00]( 8.41) 1.05 [ -5.41](13.45)
2 1.00 [ -0.00]( 5.29) 0.88 [ 12.50](13.21)
4 1.00 [ -0.00]( 1.32) 1.00 [ -0.00]( 4.80)
8 1.00 [ -0.00]( 9.52) 0.94 [ 6.25]( 8.85)
16 1.00 [ -0.00]( 1.61) 0.97 [ 3.23]( 5.00)
32 1.00 [ -0.00]( 7.27) 0.88 [ 12.50]( 2.30)
64 1.00 [ -0.00]( 6.96) 1.07 [ -6.94]( 4.94)
128 1.00 [ -0.00]( 3.41) 0.99 [ 1.44]( 2.69)
256 1.00 [ -0.00](32.95) 0.81 [ 19.17](16.38)
512 1.00 [ -0.00]( 3.20) 0.98 [ 1.66]( 2.35)


==================================================================
Test : ycsb-cassandra
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
metric tip place-initial-fix(%diff)
throughput 1.00 0.99 (%diff: -0.67%)


==================================================================
Test : ycsb-mondodb
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
metric tip place-initial-fix(%diff)
throughput 1.00 0.99 (%diff: -0.68%)


==================================================================
Test : DeathStarBench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
Note : Comparisons contains additional commit 8dafa9d0eb1a
("sched/eevdf: Fix min_deadline heap integrity") from
tip:sched/urgent to fix an EEVDF issue being hit
==================================================================
Pinning scaling tip place-initial-fix (%diff)
1CCD 1 1.00 1.00 (%diff: -0.09%)
2CCD 2 1.00 1.02 (%diff: 2.46%)
4CCD 4 1.00 1.00 (%diff: 0.45%)
8CCD 8 1.00 1.00 (%diff: -0.46%)

--

On 10/4/2023 6:39 PM, Daniel Jordan wrote:
> An entity is supposed to get an earlier deadline with
> PLACE_DEADLINE_INITIAL when it's forked, but the deadline gets
> overwritten soon after in enqueue_entity() the first time a forked
> entity is woken so that PLACE_DEADLINE_INITIAL is effectively a no-op.
>
> Placing in task_fork_fair() seems unnecessary since none of the values
> that get set (slice, vruntime, deadline) are used before they're set
> again at enqueue time, so get rid of that (and with it all of
> task_fork_fair()) and just pass ENQUEUE_INITIAL to enqueue_entity() via
> wake_up_new_task().
>
> Fixes: e8f331bcc270 ("sched/smp: Use lag to simplify cross-runqueue placement")
> Signed-off-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

> ---
>
> v2
> - place_entity() seems like the only reason for task_fork_fair() to exist
> after the recent removal of sysctl_sched_child_runs_first, so take out
> the whole function.
>
> Still based on today's peterz/sched/eevdf
>
> kernel/sched/core.c | 2 +-
> kernel/sched/fair.c | 24 ------------------------
> 2 files changed, 1 insertion(+), 25 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 779cdc7969c81..500e2dbfd41dd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4854,7 +4854,7 @@ void wake_up_new_task(struct task_struct *p)
> update_rq_clock(rq);
> post_init_entity_util_avg(p);
>
> - activate_task(rq, p, ENQUEUE_NOCLOCK);
> + activate_task(rq, p, ENQUEUE_INITIAL | ENQUEUE_NOCLOCK);
> trace_sched_wakeup_new(p);
> wakeup_preempt(rq, p, WF_FORK);
> #ifdef CONFIG_SMP
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a0b4dac2662c9..3827b302eeb9b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12427,29 +12427,6 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
> task_tick_core(rq, curr);
> }
>
> -/*
> - * called on fork with the child task as argument from the parent's context
> - * - child not yet on the tasklist
> - * - preemption disabled
> - */
> -static void task_fork_fair(struct task_struct *p)
> -{
> - struct sched_entity *se = &p->se, *curr;
> - struct cfs_rq *cfs_rq;
> - struct rq *rq = this_rq();
> - struct rq_flags rf;
> -
> - rq_lock(rq, &rf);
> - update_rq_clock(rq);
> -
> - cfs_rq = task_cfs_rq(current);
> - curr = cfs_rq->curr;
> - if (curr)
> - update_curr(cfs_rq);
> - place_entity(cfs_rq, se, ENQUEUE_INITIAL);
> - rq_unlock(rq, &rf);
> -}
> -
> /*
> * Priority of the task has changed. Check to see if we preempt
> * the current task.
> @@ -12953,7 +12930,6 @@ DEFINE_SCHED_CLASS(fair) = {
> #endif
>
> .task_tick = task_tick_fair,
> - .task_fork = task_fork_fair,
>
> .prio_changed = prio_changed_fair,
> .switched_from = switched_from_fair,

--
Thanks and Regards,
Prateek