Re: [PATCH v7 07/11] arch/x86: enable task isolation functionality
From: Chris Metcalf
Date: Tue Sep 29 2015 - 13:42:32 EST
On 09/28/2015 06:43 PM, Andy Lutomirski wrote:
Why are we treating alarms as something that should defer entry to
userspace? I think it would be entirely reasonable to set an alarm
for ten minutes, ask for isolation, and then think hard for ten
minutes.
A bigger issue would be if there's an RT task that asks for isolation
and a bunch of other stuff (most notably KVM hosts) running with
uncontrained affinity at full load. If task_isolation_enter always
sleeps, then your KVM host will get scheduled, and it'll ask for a
user return notifier on the way out, and you might just loop forever.
Can this happen?
task_isolation_enter() doesn't sleep - it spins. This is intentional,
because the point is that there should be nothing else that
could be scheduled on that cpu. We're just waiting for any
pending kernel management timer interrupts to fire.
In any case, you normally wouldn't have a KVM host running
on an isolcpus, nohz_full cpu, unless it was the only thing
running there, I imagine (just as would be true for any other
host process).
ISTM something's suboptimal with the inner workings of all this if
task_isolation_enter needs to sleep to wait for an event that isn't
scheduled for the immediate future (e.g. already queued up as an
interrupt).
Scheduling a timer for 10 minutes away is typically done by
scheduling timers for the max timer granularity (which could
be just a few seconds) and then waking up a couple of hundred
times between now and now+10 minutes. Doing this breaks
the task isolation guarantee, so we can't return to userspace
while something like that is pending. You'd have to do it
by polling in userspace to avoid the unexpected interrupts.
I suppose if your hardware supported it, you could imagine
a mode where userspace can request an alarm a specific
amount of time in the future, and the task isolation code
would then ignore an alarm that was going off at that
specific time. But I'm not sure what hardware does support
that (I know tile uses the "few seconds and re-arm" model),
and it seems like a pretty corner use-case. We could
certainly investigate adding such support later, but I don't see
it as part of the core value proposition for task isolation.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/