Re: [RFC PATCH 00/16] Core scheduling v6

From: Alexander Graf
Date: Wed Aug 26 2020 - 20:31:08 EST


Hi Vineeth,

On 30.06.20 23:32, Vineeth Remanan Pillai wrote:
Sixth iteration of the Core-Scheduling feature.

Core scheduling is a feature that allows only trusted tasks to run
concurrently on cpus sharing compute resources (eg: hyperthreads on a
core). The goal is to mitigate the core-level side-channel attacks
without requiring to disable SMT (which has a significant impact on
performance in some situations). Core scheduling (as of v6) mitigates
user-space to user-space attacks and user to kernel attack when one of
the siblings enters the kernel via interrupts. It is still possible to
have a task attack the sibling thread when it enters the kernel via
syscalls.

By default, the feature doesn't change any of the current scheduler
behavior. The user decides which tasks can run simultaneously on the
same core (for now by having them in the same tagged cgroup). When a
tag is enabled in a cgroup and a task from that cgroup is running on a
hardware thread, the scheduler ensures that only idle or trusted tasks
run on the other sibling(s). Besides security concerns, this feature
can also be beneficial for RT and performance applications where we
want to control how tasks make use of SMT dynamically.

This iteration is mostly a cleanup of v5 except for a major feature of
pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
introducing documentation and includes minor crash fixes.

One major cleanup was removing the hotplug support and related code.
The hotplug related crashes were not documented and the fixes piled up
over time leading to complex code. We were not able to reproduce the
crashes in the limited testing done. But if they are reroducable, we
don't want to hide them. We should document them and design better
fixes if any.

In terms of performance, the results in this release are similar to
v5. On a x86 system with N hardware threads:
- if only N/2 hardware threads are busy, the performance is similar
between baseline, corescheduling and nosmt
- if N hardware threads are busy with N different corescheduling
groups, the impact of corescheduling is similar to nosmt
- if N hardware threads are busy and multiple active threads share the
same corescheduling cookie, they gain a performance improvement over
nosmt.
The specific performance impact depends on the workload, but for a
really busy database 12-vcpu VM (1 coresched tag) running on a 36
hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
their own coresched tag), the performance drops by 54% with
corescheduling and drops by 90% with nosmt.

v6 is rebased on 5.7.6(a06eb423367e)
https://github.com/digitalocean/linux-coresched/tree/coresched/v6-v5.7.y

As discussed during Linux Plumbers, here is a small repo with test scripts and applications that I've used to look at core scheduling unfairness:

https://github.com/agraf/schedgaps

Please let me know if it's unclear how to use it or if you see issues in your environment.

Please also make sure to only run this on idle server class hardware. Notebooks will most definitely have too many uncontrollable sources of timing entropy to give sensible results.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879