On Fri, Apr 08, 2016 at 12:34:48PM -0400, Chris Metcalf wrote:
On 4/8/2016 9:56 AM, Frederic Weisbecker wrote:Ok.
On Wed, Mar 09, 2016 at 02:39:28PM -0500, Chris Metcalf wrote:So here I'm taking "interrupt" to mean an external, asynchronous
TL;DR: Let's make an explicit decision about whether task isolationBut then in this mode, what happens when an interrupt triggers.
should be "persistent" or "one-shot". Both have some advantages.
=====
An important high-level issue is how "sticky" task isolation mode is.
We need to choose one of these two options:
"Persistent mode": A task switches state to "task isolation" mode
(kind of a level-triggered analogy) and stays there indefinitely. It
can make a syscall, take a page fault, etc., if it wants to, but the
kernel protects it from incurring any further asynchronous interrupts.
This is the model I've been advocating for.
interrupt, from another core or device, or asynchronously triggered
on the local core, like a timer interrupt. By contrast I use "exception"
or "fault" to refer to synchronous, locally-triggered interruptions.
So for interrupts, the short answer is, it's a bug! :-)So if we take an interrupt that we didn't expect, we want to wait some more
An interrupt could be a kernel bug, in which case we consider it a
"true" bug. This could be a timer interrupt occurring even after the
task isolation code thought there were none pending, or a hardware
device that incorrectly distributes interrupts to a task-isolation
cpu, or a global IPI that should be sent to fewer cores, or a kernel
TLB flush that could be deferred until the task-isolation task
re-enters the kernel later, etc. Regardless, I'd consider it a kernel
bug. I'm sure there are more such bugs that we can continue to fix
going forward; it depends on how arbitrary you want to allow code
running on other cores to be. For example, can another core unload a
kernel module without interrupting a task-isolation task? Not right now.
Or, it could be an application bug: the standard example is if you
have an application with task-isolated cores that also does occasional
unmaps on another thread in the same process, on another core. This
causes TLB flush interrupts under application control. The
application shouldn't do this, and we tell our customers not to build
their applications this way. The typical way we encourage our
customers to arrange this kind of "multi-threading" is by having a
pure memory API between the task isolation threads and what are
typically "control" threads running on non-task-isolated cores. The
two types of threads just both mmap some common, shared memory but run
as different processes.
So what happens if an interrupt does occur?
In the "base" task isolation mode, you just take the interrupt, then
wait to quiesce any further kernel timer ticks, etc., and return to
the process. This at least limits the damage to being a single
interruption rather than potentially additional ones, if the interrupt
also caused timers to get queued, etc.
in the end of that interrupt to wait for things to quiesce some more?
That doesn't look right. Things should be quiesced once and for all on
return from the initial prctl() call. We can't even expect to quiesce more
in case of interruptions, the tick can't be forced off anyway.
Or, you can enable "strict" mode, and then you get hard isolationThat would be extreme strict mode yeah. We can still add such mode later
without the ability to get in and out of the kernel at all: the kernel
just kills you if you try to leave hard isolation other than by an
explicit prctl().
if any user request it.
(I'll reply the rest of the email soonish)