[PATCH] x86/resctrl: avoid compiler optimization in __resctrl_sched_in
From: Stephane Eranian
Date: Fri Mar 03 2023 - 18:11:45 EST
When compiling the kernel with clang, we have a problem with the code
in __rescrtl_sched_in() because of the way theinline is optimized when called
from __switch_to(). The effect of the problem is that the bandwidth
restriction driven by the CLOSId is not enforced. The problem is easy to
reproduce on Intel or AMD x86 systems:
1. If resctrl filesystem is not mounted:
$ mount -t resctrl none /sys/fs/resctrl
2. Create resctrl group:
$ mkdir /sys/fs/resctrl/test
3. move shell into resctrl group
$ echo $$ > /sys/fs/resctrl/test/tasks
4. Run bandwidth consuming test in background on CPU0
$ taskset -c 0 triad &
5. Monitor bandwidth consumption
Using perf to measure bandwidth on your processor
6. Restrict bandwidth
- Intel: $ echo MB:0=10 > /sys/fs/resctrl/test/schemata
- AMD: $ echo MB:0=240 > /sys/fs/resctrl/tests/schemata
7. Monitor bandwidth again
At 7, you will see that the restriction is not enforced.
The problem is located in the __resctrl_sched_in() routine which rewrites
the active closid via the PQR_ASSOC register. Because this is an expensive
operation, the kernel only does it when the context switch involves tasks
with different CLOSID. And to check that, it needs to access the current
task's closid field using current->closid. current is actually a macro
that reads the per-cpu variable pcpu_hot.current_task.
After an investigation by compiler experts, the problem has been tracked down
to the usage of the get_current() macro in the __resctrl_sched_in() code and
in particular the per-cpu macro:
static __always_inline struct task_struct *get_current(void)
{
return this_cpu_read_stable(pcpu_hot.current_task);
}
And as per percpu.h:
/*
* this_cpu_read() makes gcc load the percpu variable every time it is
* accessed while this_cpu_read_stable() allows the value to be cached.
* this_cpu_read_stable() is more efficient and can be used if its value
* is guaranteed to be valid across cpus. The current users include
* get_current() and get_thread_info() both of which are actually
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
The _stable version of the macro allows the value to be cached, meaning it
does not force a reload.
However in the __switch_to() routine, the current point is changed. If the
compiler optimizes away the reload, then the resctrl_sched_in will look
at the previous task instead of the new current task. This explains why we
were not seeing the limit enforced on the benchmark.
To fix the problem, the resctrl_sched_in() function must use the
this_cpu_read() form of the macro. Given this is specific to the __switch_to
routine, we do not change get_current() but instead invoke the lower level
macro directly from __resched_sched_in(). This has no impact on any other
calls of get_current().
Note that the problem was detected when compiling the kernel with clang (v14)
but it could also happen with gcc.
Signed-off-by: Stephane Eranian <eranian@xxxxxxxxxx>
---
arch/x86/include/asm/resctrl.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 52788f79786fa..f791606a8cb15 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -54,6 +54,7 @@ static void __resctrl_sched_in(void)
struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
u32 closid = state->default_closid;
u32 rmid = state->default_rmid;
+ struct task_struct *cur;
u32 tmp;
/*
@@ -61,13 +62,15 @@ static void __resctrl_sched_in(void)
* Else use the closid/rmid assigned to this cpu.
*/
if (static_branch_likely(&rdt_alloc_enable_key)) {
- tmp = READ_ONCE(current->closid);
+ cur = this_cpu_read(pcpu_hot.current_task);
+ tmp = READ_ONCE(cur->closid);
if (tmp)
closid = tmp;
}
if (static_branch_likely(&rdt_mon_enable_key)) {
- tmp = READ_ONCE(current->rmid);
+ cur = this_cpu_read(pcpu_hot.current_task);
+ tmp = READ_ONCE(cur->rmid);
if (tmp)
rmid = tmp;
}
--
2.40.0.rc0.216.gc4246ad0f0-goog