On Tue, Jul 28, 2015 at 03:49:36PM -0400, Chris Metcalf wrote:
The existing nohz_full mode is designed as a "soft" isolation modeWe are doing this at the process level but the isolation works on
that makes tradeoffs to minimize userspace interruptions while
still attempting to avoid overheads in the kernel entry/exit path,
to provide 100% kernel semantics, etc.
However, some applications require a "hard" commitment from the
kernel to avoid interruptions, in particular userspace device
driver style applications, such as high-speed networking code.
This change introduces a framework to allow applications
to elect to have the "hard" semantics as needed, specifying
prctl(PR_SET_CPU_ISOLATED, PR_CPU_ISOLATED_ENABLE) to do so.
Subsequent commits will add additional flags and additional
semantics.
the CPU scope... Now I wonder if prctl is the right interface.
That said the user is rather interested in isolating a task. The CPU
being the backend eventually.
For example if the task is migrated by accident, we want it to be
warned about that. And if the isolation is done on the CPU level
instead of the task level, this won't happen.
I'm also afraid that the naming clashes with cpu_isolated_map,
although it could be a subset of it.
So probably in this case we should consider talking about task rather
than CPU isolation and change naming accordingly (sorry, I know I
suggested cpu_isolation.c, I guess I had to see the result to realize).
We must sort that out first. Either we consider isolation on the task
level (and thus the underlying CPU by backend effect) and we use prctl().
Or we do this on the CPU level and we use a specific syscall or sysfs
which takes effect on any task in the relevant isolated CPUs.
What do you think?
The kernel must be built with the new CPU_ISOLATED Kconfig flagIt's not about time :-)
to enable this mode, and the kernel booted with an appropriate
nohz_full=CPULIST boot argument. The "cpu_isolated" state is then
indicated by setting a new task struct field, cpu_isolated_flags,
to the value passed by prctl(). When the _ENABLE bit is set for a
task, and it is returning to userspace on a nohz_full core, it calls
the new cpu_isolated_enter() routine to take additional actions
to help the task avoid being interrupted in the future.
Initially, there are only three actions taken. First, the
task calls lru_add_drain() to prevent being interrupted by a
subsequent lru_add_drain_all() call on another core. Then, it calls
quiet_vmstat() to quieten the vmstat worker to avoid a follow-on
interrupt. Finally, the code checks for pending timer interrupts
and quiesces until they are no longer pending. As a result, sys
calls (and page faults, etc.) can be inordinately slow. However,
this quiescing guarantees that no unexpected interrupts will occur,
even if the application intentionally calls into the kernel.
Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
---
arch/tile/kernel/process.c | 9 ++++++
include/linux/cpu_isolated.h | 24 +++++++++++++++
include/linux/sched.h | 3 ++
include/uapi/linux/prctl.h | 5 ++++
kernel/context_tracking.c | 3 ++
kernel/sys.c | 8 +++++
kernel/time/Kconfig | 20 +++++++++++++
kernel/time/Makefile | 1 +
kernel/time/cpu_isolated.c | 71 ++++++++++++++++++++++++++++++++++++++++++++
The timer is only a part of the isolation.
Moreover "isolatED" is a state. The filename should reflect the process. "isolatION" would
better fit.
kernel/task_isolation.c maybe or just kernel/isolation.c
I think I prefer the latter because I'm not only interested in that task
hard isolation feature, I would like to also drive all the general isolation
operations from there (workqueue affinity, rcu nocb, ...).
+#ifdef CONFIG_CPU_ISOLATEDI'm still uncomfortable with that. A wake up model could work?
+void cpu_isolated_wait(void)
+{
+ set_current_state(TASK_INTERRUPTIBLE);
+ _cpu_idle();
+ set_current_state(TASK_RUNNING);
+}
+extern void cpu_isolated_enter(void);And all the naming should be about task as well, if we take that task direction.
+extern void cpu_isolated_wait(void);
+#else
+static inline bool is_cpu_isolated(void) { return false; }
+static inline void cpu_isolated_enter(void) { }
+#endif
+Can't we add a new flag to tsk->flags? There seem to be some values remaining.
+#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 04b5ada460b4..0bb248385d88 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1776,6 +1776,9 @@ struct task_struct {
unsigned long task_state_change;
#endif
int pagefault_disabled;
+#ifdef CONFIG_CPU_ISOLATED
+ unsigned int cpu_isolated_flags;
+#endif