FW: [RFC PATCH 00/13] Core scheduling v5
From: Gruza, Agata
Date: Thu May 14 2020 - 20:57:02 EST
-----Original Message-----
From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-owner@xxxxxxxxxxxxxxx> On Behalf Of Ning, Hongyu
Sent: Friday, May 8, 2020 8:40 PM
To: vpillai@xxxxxxxxxxxxxxxx; naravamudan@xxxxxxxxxxxxxxxx; jdesfossez@xxxxxxxxxxxxxxxx; peterz@xxxxxxxxxxxxx; Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>; mingo@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; pjt@xxxxxxxxxx; torvalds@xxxxxxxxxxxxxxxxxxxx
Cc: vpillai@xxxxxxxxxxxxxxxx; fweisbec@xxxxxxxxx; keescook@xxxxxxxxxxxx; kerrnel@xxxxxxxxxx; pauld@xxxxxxxxxx; aaron.lwe@xxxxxxxxx; aubrey.intel@xxxxxxxxx; Li, Aubrey <aubrey.li@xxxxxxxxxxxxxxx>; valentin.schneider@xxxxxxx; mgorman@xxxxxxxxxxxxxxxxxxx; pawan.kumar.gupta@xxxxxxxxxxxxxxx; pbonzini@xxxxxxxxxx; joelaf@xxxxxxxxxx; joel@xxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [RFC PATCH 00/13] Core scheduling v5
- Test environment:
Intel Xeon Server platform
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 2
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 4
- Kernel under test:
Core scheduling v5 base
https://github.com/digitalocean/linux-coresched/tree/coresched/v5-v5.5.y
- Test set based on sysbench 1.1.0-bd4b418:
A: sysbench cpu in cgroup cpu 1 + sysbench mysql in cgroup mysql 1 (192 workload tasks for each cgroup)
B: sysbench cpu in cgroup cpu 1 + sysbench cpu in cgroup cpu 2 + sysbench mysql in cgroup mysql 1 + sysbench mysql in cgroup mysql 2 (192 workload tasks for each cgroup)
- Test results briefing:
1 Good results:
1.1 For test set A, coresched could achieve same or better performance compared to smt_off, for both cpu workload and sysbench workload
1.2 For test set B, cpu workload, coresched could achieve better performance compared to smt_off
2 Bad results:
2.1 For test set B, mysql workload, coresched performance is lower than smt_off, potential fairness issue between cpu workloads and mysql workloads
2.2 For test set B, cpu workload, potential fairness issue between 2 cgroups cpu workloads
- Test results:
Note: test results in following tables are Tput normalized to default baseline
-- Test set A Tput normalized results:
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+
| | **** | default | coresched | smt_off | *** | default | coresched | smt_off |
+====================+========+===========+=============+===========+===
+====+=============+===============+=============+
| cgroups | **** | cg cpu 1 | cg cpu 1 | cg cpu 1 | *** | cg mysql 1 | cg mysql 1 | cg mysql 1 |
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+
| sysbench workload | **** | cpu | cpu | cpu | *** | mysql | mysql | mysql |
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+
| 192 tasks / cgroup | **** | 1 | 0.95 | 0.54 | *** | 1 | 0.92 | 0.97 |
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+
-- Test set B Tput normalized results:
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+------+-------------+---------------+-------------+-----+-------------+---------------+-------------+
| | **** | default | coresched | smt_off | *** | default | coresched | smt_off | ** | default | coresched | smt_off | * | default | coresched | smt_off |
+====================+========+===========+=============+===========+===
+====+=============+===============+=============+======+=============+=
+==============+=============+=====+=============+===============+======
+=======+
| cgroups | **** | cg cpu 1 | cg cpu 1 | cg cpu 1 | *** | cg cpu 2 | cg cpu 2 | cg cpu 2 | ** | cg mysql 1 | cg mysql 1 | cg mysql 1 | * | cg mysql 2 | cg mysql 2 | cg mysql 2 |
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+------+-------------+---------------+-------------+-----+-------------+---------------+-------------+
| sysbench workload | **** | cpu | cpu | cpu | *** | cpu | cpu | cpu | ** | mysql | mysql | mysql | * | mysql | mysql | mysql |
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+------+-------------+---------------+-------------+-----+-------------+---------------+-------------+
| 192 tasks / cgroup | **** | 1 | 0.9 | 0.47 | *** | 1 | 1.32 | 0.66 | ** | 1 | 0.42 | 0.89 | * | 1 | 0.42 | 0.89 |
+--------------------+--------+-----------+-------------+-----------+-------+-------------+---------------+-------------+------+-------------+---------------+-------------+-----+-------------+---------------+-------------+
> On Date: Wed, 4 Mar 2020 16:59:50 +0000, vpillai <vpillai@xxxxxxxxxxxxxxxx> wrote:
> To: Nishanth Aravamudan <naravamudan@xxxxxxxxxxxxxxxx>, Julien
> Desfossez <jdesfossez@xxxxxxxxxxxxxxxx>, Peter Zijlstra
> <peterz@xxxxxxxxxxxxx>, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>,
> mingo@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, pjt@xxxxxxxxxx,
> torvalds@xxxxxxxxxxxxxxxxxxxx
> CC: vpillai <vpillai@xxxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx,
> fweisbec@xxxxxxxxx, keescook@xxxxxxxxxxxx, kerrnel@xxxxxxxxxx, Phil
> Auld <pauld@xxxxxxxxxx>, Aaron Lu <aaron.lwe@xxxxxxxxx>, Aubrey Li
> <aubrey.intel@xxxxxxxxx>, aubrey.li@xxxxxxxxxxxxxxx, Valentin
> Schneider <valentin.schneider@xxxxxxx>, Mel Gorman
> <mgorman@xxxxxxxxxxxxxxxxxxx>, Pawan Gupta
> <pawan.kumar.gupta@xxxxxxxxxxxxxxx>, Paolo Bonzini
> <pbonzini@xxxxxxxxxx>, Joel Fernandes <joelaf@xxxxxxxxxx>,
> joel@xxxxxxxxxxxxxxxxx
>
>
> Fifth iteration of the Core-Scheduling feature.
>
> Core scheduling is a feature that only allows trusted tasks to run
> concurrently on cpus sharing compute resources(eg: hyperthreads on a
> core). The goal is to mitigate the core-level side-channel attacks
> without requiring to disable SMT (which has a significant impact on
> performance in some situations). So far, the feature mitigates
> user-space to user-space attacks but not user-space to kernel attack,
> when one of the hardware thread enters the kernel (syscall, interrupt etc).
>
> By default, the feature doesn't change any of the current scheduler
> behavior. The user decides which tasks can run simultaneously on the
> same core (for now by having them in the same tagged cgroup). When a
> tag is enabled in a cgroup and a task from that cgroup is running on a
> hardware thread, the scheduler ensures that only idle or trusted tasks
> run on the other sibling(s). Besides security concerns, this feature
> can also be beneficial for RT and performance applications where we
> want to control how tasks make use of SMT dynamically.
>
> This version was focusing on performance and stability. Couple of
> crashes related to task tagging and cpu hotplug path were fixed.
> This version also improves the performance considerably by making task
> migration and load balancing coresched aware.
>
> In terms of performance, the major difference since the last iteration
> is that now even IO-heavy and mixed-resources workloads are less
> impacted by core-scheduling than by disabling SMT. Both host-level and
> VM-level benchmarks were performed. Details in:
> https://lkml.org/lkml/2020/2/12/1194
> https://lkml.org/lkml/2019/11/1/269
>
> v5 is rebased on top of 5.5.5(449718782a46)
> https://github.com/digitalocean/linux-coresched/tree/coresched/v5-v5.5
> .y
>
----------------------------------------------------------------------
ABOUT:
----------------------------------------------------------------------
Hello,
Core scheduling is required to protect against leakage of sensitive
data allocated on a sibling thread. Our goal is to measure performance
impact of core scheduling across different workloads and show how it
evolved over time. Below you will find data based on core-sched (v5).
In attached PDF system configuration setup as well as further
explanation of the findings.
----------------------------------------------------------------------
BENCHMARKS:
----------------------------------------------------------------------
- hammerdb : database benchmarking application
- sysbench-cpu : multi-threaded cpu benchmark
- sysbench-mysql: multi-threaded benchmark that tests open source DBMS
- build-kernel : benchmark that is used to build Linux kernel
----------------------------------------------------------------------
PERFORMANCE IMPACT:
----------------------------------------------------------------------
+--------------------+--------+--------------+-------------+-------------------+--------------------+----------------------+
| benchmark | **** | # of cgroups | overcommit | baseline + smt_on | coresched + smt_on | baseline + smt_off |
+====================+========+==============+=============+===================+====================+======================+
| hammerdb | **** | 2cgroups | 2x | 1 | 0.96 | 0.87 |
+--------------------+--------+--------------+-------------+-------------------+--------------------+----------------------+
| sysbench-cpu | **** | 2cgroups | 2x | 1 | 0.95 | 0.54 |
| sysbench-mysql | **** | | | 1 | 0.90 | 0.47 |
+--------------------+--------+--------------+-------------+-------------------+--------------------+----------------------+
| sysbench-cpu | **** | 4cgroups | 4x | 1 | 0.90 | 0.47 |
| sysbench-cpu | **** | | | 1 | 1.32 | 0.66 |
| sysbench-mycql | **** | | | 1 | 0.42 | 0.89 |
| sysbench-mysql | **** | | | 1 | 0.42 | 0.89 |
+--------------------+--------+--------------+-------------+-------------------+--------------------+----------------------+
| kernel-build | **** | 2cgroups | 0.5x | 1 | 1 | 0.93 |
| | **** | | 1x | 1 | 0.99 | 0.92 |
| | **** | | 2x | 1 | 0.98 | 0.91 |
+--------------------+--------+--------------+-------------+-------------------+--------------------+----------------------+
----------------------------------------------------------------------
TAKE AWAYS:
----------------------------------------------------------------------
1. Core scheduling performs better than turning off HT.
2. Impact of core scheduling depends on the workload and thread
scheduling intensity.
3. Core scheduling requires cgroups. Tasks from the same cgroup are
scheduled on the same core.
4. Having core scheduling, in certain situations will introduce
an uneven load distribution between multiple workload types.
In such a case bias towards the cpu intensive workload is expected.
5. Load balancing is not perfect. It needs more work.
Many thanks,
--Agata
Attachment:
LKML_core_sched_v5.5.y.pdf
Description: LKML_core_sched_v5.5.y.pdf