Re: [PATCH 00/19] sched: Core Scheduling

From: Ning, Hongyu
Date: Fri Apr 30 2021 - 02:47:24 EST



On 2021/4/22 20:04, Peter Zijlstra wrote:
> Hai,
>
> This is an agressive fold of all the core-scheduling work so far. I've stripped
> a whole bunch of tags along the way (hopefully not too many, please yell if you
> feel I made a mistake), including tested-by. Please retest.
>
> Changes since the last partial post is dropping all the cgroup stuff and
> PR_SCHED_CORE_CLEAR as well as that exec() behaviour in order to later resolve
> the cgroup issue.
>
> Since we're really rather late for the coming merge window, my plan was to
> merge the lot right after the merge window.
>
> Again, please test.
>
> These patches should shortly be available in my queue.git.
>
> ---
> b/kernel/sched/core_sched.c | 229 ++++++
> b/tools/testing/selftests/sched/.gitignore | 1
> b/tools/testing/selftests/sched/Makefile | 14
> b/tools/testing/selftests/sched/config | 1
> b/tools/testing/selftests/sched/cs_prctl_test.c | 338 +++++++++
> include/linux/sched.h | 19
> include/uapi/linux/prctl.h | 8
> kernel/Kconfig.preempt | 6
> kernel/fork.c | 4
> kernel/sched/Makefile | 1
> kernel/sched/core.c | 858 ++++++++++++++++++++++--
> kernel/sched/cpuacct.c | 12
> kernel/sched/deadline.c | 38 -
> kernel/sched/debug.c | 4
> kernel/sched/fair.c | 276 +++++--
> kernel/sched/idle.c | 13
> kernel/sched/pelt.h | 2
> kernel/sched/rt.c | 31
> kernel/sched/sched.h | 393 ++++++++--
> kernel/sched/stop_task.c | 14
> kernel/sched/topology.c | 4
> kernel/sys.c | 5
> tools/include/uapi/linux/prctl.h | 8
> 23 files changed, 2057 insertions(+), 222 deletions(-)
>


Adding sysbench/uperf/wis performance results for reference:

- kernel under test:
-- above patchset of core-scheduling + local fix for softlockup issue: https://lore.kernel.org/lkml/5c289c5a-a120-a1d0-ca89-2724a1445fe8@xxxxxxxxxxxxxxx/
-- coresched_v10 kernel source: https://github.com/digitalocean/linux-coresched/commits/coresched/v10-v5.10.y

- workloads:
-- A. sysbench cpu (192 threads) + sysbench cpu (192 threads)
-- B. sysbench cpu (192 threads) + sysbench mysql (192 threads)
-- C. uperf netperf.xml (192 threads over TCP or UDP protocol separately)
-- D. will-it-scale context_switch via pipe (192 threads)

- test machine setup:
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 2
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 4

- performance change key info:
--workload B: coresched (cs_on), sysbench mysql performance drop around 20% vs coresched_v10
--workload C, coresched (cs_on), uperf performance increased almost double vs coresched_v10
--workload C, default (cs_off), uperf performance drop over 20% vs coresched_v10, same issue seen on v5.12-rc8 base (w/o coresched patchset)
--workload D, coresched (cs_on), wis performance increased almost double vs coresched_v10

- performance info of workloads, normalized based on coresched_v10 results
--workload A:
Note:
* no performance change compared to coresched_v10
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| | ** | coresched_peterz_aubrey_fix_base_v5.12-rc8 | coresched_peterz_aubrey_fix_base_v5.12-rc8 | *** | coresched_v10_base_v5.10.11 | coresched_v10_base_v5.10.11 |
+=======================================+======+==============================================+================================================+=======+===============================+=================================+
| workload | ** | sysbench cpu * 192 | sysbench cpu * 192 | *** | sysbench cpu * 192 | sysbench cpu * 192 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| prctl/cgroup | ** | prctl on workload cpu_0 | prctl on workload cpu_1 | *** | cg_sysbench_cpu_0 | cg_sysbench_cpu_1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| record_item | ** | Tput_avg (events/s) | Tput_avg (events/s) | *** | Tput_avg (events/s) | Tput_avg (events/s) |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| coresched normalized vs coresched_v10 | ** | 0.99 | 1.01 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| default normalized vs coresched_v10 | ** | 1.03 | 0.98 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| smtoff normalized vs coresched_v10 | ** | 1.01 | 0.99 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+

--workload B:
Note:
* coresched (cs_on), sysbench mysql performance drop around 20% vs coresched_v10
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| | ** | coresched_peterz_aubrey_fix_base_v5.12-rc8 | coresched_peterz_aubrey_fix_base_v5.12-rc8 | *** | coresched_v10_base_v5.10.11 | coresched_v10_base_v5.10.11 |
+=======================================+======+==============================================+================================================+=======+===============================+=================================+
| workload | ** | sysbench cpu * 192 | sysbench mysql * 192 | *** | sysbench cpu * 192 | sysbench mysql * 192 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| prctl/cgroup | ** | prctl on workload cpu_0 | prctl on workload mysql_0 | *** | cg_sysbench_cpu_0 | cg_sysbench_mysql_0 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| record_item | ** | Tput_avg (events/s) | Tput_avg (events/s) | *** | Tput_avg (events/s) | Tput_avg (events/s) |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| coresched normalized vs coresched_v10 | ** | 1.03 | 0.77 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| default normalized vs coresched_v10 | ** | 1.02 | 0.9 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| smtoff normalized vs coresched_v10 | ** | 0.94 | 1.14 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+

--workload C:
Note:
* coresched (cs_on), uperf performance increased almost double vs coresched_v10
* default (cs_off), uperf performance drop over 20% vs coresched_v10, same issue seen on v5.12-rc8 base (w/o coresched patchset)
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| | ** | coresched_peterz_aubrey_fix_base_v5.12-rc8 | coresched_peterz_aubrey_fix_base_v5.12-rc8 | *** | coresched_v10_base_v5.10.11 | coresched_v10_base_v5.10.11 |
+=======================================+======+==============================================+================================================+=======+===============================+=================================+
| workload | ** | uperf netperf TCP * 192 | uperf netperf UDP * 192 | *** | uperf netperf TCP * 192 | uperf netperf UDP * 192 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| prctl/cgroup | ** | prctl on workload uperf | prctl on workload uperf | *** | cg_uperf | cg_uperf |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| record_item | ** | Tput_avg (Gb/s) | Tput_avg (Gb/s) | *** | Tput_avg (Gb/s) | Tput_avg (Gb/s) |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| coresched normalized vs coresched_v10 | ** | 1.87 | 1.99 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| default normalized vs coresched_v10 | ** | 0.78 | 0.74 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+
| smtoff normalized vs coresched_v10 | ** | 0.87 | 0.95 | *** | 1 | 1 |
+---------------------------------------+------+----------------------------------------------+------------------------------------------------+-------+-------------------------------+---------------------------------+

--workload D:
Note:
* coresched (cs_on), wis performance increased almost double vs coresched_v10
+---------------------------------------+------+----------------------------------------------+-------+-------------------------------+
| | ** | coresched_peterz_aubrey_fix_base_v5.12-rc8 | *** | coresched_v10_base_v5.10.11 |
+=======================================+======+==============================================+=======+===============================+
| workload | ** | will-it-scale * 192 | *** | will-it-scale * 192 |
| | | (pipe based context_switch) | | (pipe based context_switch) |
+---------------------------------------+------+----------------------------------------------+-------+-------------------------------+
| prctl/cgroup | ** | prctl on workload wis | *** | cg_wis |
+---------------------------------------+------+----------------------------------------------+-------+-------------------------------+
| record_item | ** | threads_avg | *** | threads_avg |
+---------------------------------------+------+----------------------------------------------+-------+-------------------------------+
| coresched normalized vs coresched_v10 | ** | 1.98 | *** | 1 |
+---------------------------------------+------+----------------------------------------------+-------+-------------------------------+
| default normalized vs coresched_v10 | ** | 1.13 | *** | 1 |
+---------------------------------------+------+----------------------------------------------+-------+-------------------------------+
| smtoff normalized vs coresched_v10 | ** | 1.32 | *** | 1 |
+---------------------------------------+------+----------------------------------------------+-------+-------------------------------+

-- notes on record_item:
* coresched normalized vs coresched_v10: smton, cs enabled, test result normalized by result of coresched_v10 under same config
* default normalized vs coresched_v10: smton, cs disabled, test result normalized by result of coresched_v10 under same config
* smtoff normalized vs coresched_v10: smtoff, test result normalized by result of coresched_v10 under same config

Hongyu