Re: [PATCH v7 03/11] task_isolation: support PR_TASK_ISOLATION_STRICT mode

From: Chris Metcalf
Date: Thu Oct 01 2015 - 15:26:47 EST


On 09/29/2015 02:00 PM, Andy Lutomirski wrote:
On Tue, Sep 29, 2015 at 10:57 AM, Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote:
On 09/29/2015 01:46 PM, Andy Lutomirski wrote:
On Tue, Sep 29, 2015 at 10:35 AM, Chris Metcalf <cmetcalf@xxxxxxxxxx>
wrote:
Well, the most interesting category is things that don't actually
trigger a signal (e.g. minor page fault) since those are things that
cause significant issues with task isolation processes
(kernel-induced jitter) but aren't otherwise user-visible,
much like an undiscovered syscall in a third-party library
can cause unexpected jitter.
Would it make sense to exempt the exceptions that result in signals?
After all, those are detectable even without your patches. Going
through all of the exception types:

divide_error, overflow, invalid_op, coprocessor_segment_overrun,
invalid_TSS, segment_not_present, stack_segment, alignment_check:
these all send signals anyway.

double_fault is fatal.

bounds: MPX faults can be silently fixed up, and those will need
notification. (Or user code should know not to do that, since it
requires an explicit opt in, and user code can flip it back off to get
the signals.)

general_protection: always signals except in vm86 mode.

int3: silently fixed if uprobes are in use, but I don't think
isolation cares about that. Otherwise signals.

debug: The perf hw_breakpoint can result in silent fixups, but those
require explicit opt-in from the admin. Otherwise, unless there's a
bug or a debugger, the user will get a signal. (As a practical
matter, the only interesting case is the undocumented ICEBP
instruction.)

math_error, simd_coprocessor_error: Sends a signal.

spurious_interrupt_bug: Irrelevant on any modern CPU AFAIK. We should
just WARN if this hits.

device_not_available: If you're using isolation without an FPU, you
have bigger problems.

page_fault: Needs notification.

NMI, MCE: arguably these should *not* notify or at least not fatally.

So maybe a better approach would be to explicitly notify for the
relevant entries: IRQs, non-signalling page faults, and non-signalling
MPX fixups. Other arches would have their own lists, but they're
probably also short except for emulated instructions.

IRQs should get notified via the task_isolation_debug boot flag;
the intent is that they should never get delivered to nohz_full
cores anyway, so we produce a console backtrace if the boot
flag is enabled. This isn't tied to having a task running with
TASK_ISOLATION enabled, since it just shouldn't ever happen.
OK, I like that. In that case, maybe NMI and MCE should be in a
similar category. (IOW if a non-fatal MCE happens and the debug param
is set, we could warn, assuming that anyone is willing to write the
code. Doing printk from MCE is not entirely trivial, although it's
less bad in recent kernels.)

For now I will stay away from tampering with the NMI/MCE
handlers, though if it turns out that it's the cause of mysterious
latencies in task-isolation applications in the future, it will
likely make sense to add some debugging there.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/