Re: [PATCH v7 02/11] task_isolation: add initial support

From: Chris Metcalf
Date: Fri Oct 02 2015 - 13:16:17 EST

Next message: Nishanth Aravamudan: "[PATCH 0/2] Fix NVMe driver support on Power with 32-bit DMA"
Previous message: Mark Brown: "Applied "regulator: act8865: support output voltage by VSET2[] bits" to the regulator tree"
In reply to: Thomas Gleixner: "Re: [PATCH v7 02/11] task_isolation: add initial support"
Next in thread: Thomas Gleixner: "Re: [PATCH v7 02/11] task_isolation: add initial support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/01/2015 05:20 PM, Thomas Gleixner wrote:

On Thu, 1 Oct 2015, Chris Metcalf wrote:

But first I want to address the question of the basic semantics
of the patch series. I wrote up a description of why it's useful
in my email yesterday:

https://lkml.kernel.org/r/560C4CF4.9090601@xxxxxxxxxx

I haven't directly heard from you as to whether you buy the
basic premise of "hard isolation" in terms of protecting tasks
from all kernel interrupts while they execute in userspace.

Just for the record. The first serious initiative to solve that
problem started here in my own company when I guided Frederic through
the endavour of figuring out what needs to be done to achieve
that. That was the assignement of his master thesis, which I gave him.

Thanks for that background. I didn't know you had gotten
Frederic started down that path originally.

So I first want to address what is effectively the API concern that
you raised, namely that you're concerned that there is a wait
loop in the implementation.

That wait loop is just a place holder for the underlying more serious
concern I have with this whole approach. And I raised that concern
several times in the past and I'm happy to do so again.

The people working on this, especially you, are just dead set to
achieve a certain functionality by jamming half baken mechanisms into
the kernel and especially into the low level entry/exit code. And
that's something which really annoys me, simply because you refuse to
tackle the problems which have been identified as need to be solved 5+
years ago when Frederic did his thesis.

I think you raise a good point. I still claim my arguments are
plausible, but you may be right that this is an instance where
forcing a different approach is better for the kernel community
as a whole.

Given that, what would you think of the following two changes
to my proposed patch series:

1. Rather than spinning in a busy loop if timers are pending,
we reschedule if more than one task is ready to run. This
directly targets the "architected" problem with the scheduler
tick, rather than sweeping up the scheduler tick and any other
timers into the one catch-all of "any timer ready to fire".
(We can use sched_can_stop_tick() to check the case where
other tasks can preempt us.) This would then provide part
of the semantics of the task-isolation flag. The other part is
running whatever code can be run to avoid the various ways
tasks might get interrupted later (lru_add_drain(),
quiet_vmstat(), etc) that are not appropriate to run
unconditionally for tasks that aren't trying to be isolated.

2. Remove the tie between disabling the 1 Hz max deferment
and task isolation per se. Instead add a boot flag (e.g.
"debug_1hz_tick") that lets us turn off the 1 Hz tick to make it
easy to experiment with both the negative effects of the
missing tick, as well as to try to learn in parallel what actual
timer interrupts are firing "on purpose" rather than just due
to the 1 Hz tick to try to eliminate them as well.

For #1, I'm not sure if it's better to hack up the scheduler's
pick_next_task callback methods to avoid task-isolation tasks
when other tasks are also available to run, or just to observe
that there are additional tasks ready to run during exit to
userspace, and yield the cpu to allow those other tasks to run.
The advantage of doing it at exit to userspace is that we can
easily yield in a loop and pay attention to whether we seem
not to be making forward progress with that task and generate
a suitable warning; it also keeps a lot of task-isolation stuff
out of the core scheduler code, which may be a plus.

With these changes, and booting with the "debug_1hz_tick"
flag, I'm seeing a couple of timer ticks hit my task-isolation
task in the first 20 ms or so, and then it quiesces. I will
plan to work on figuring out what is triggering those
interrupts and seeing how to fix them. My hope is that in
parallel with that work, other folks can be working on how to
fix problems that occur more silently with the scheduler
tick max deferment disabled; I'm also happy to work on those
problems to the extent that I understand them (and I'm
always happy to learn more).

As part of the patch series I'd extend the proposed
task_isolation_debug flag to also track timer scheduling
events against task-isolation tasks that are ready to run
in userspace (no other runnable tasks).

What do you think of this approach?

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Nishanth Aravamudan: "[PATCH 0/2] Fix NVMe driver support on Power with 32-bit DMA"
Previous message: Mark Brown: "Applied "regulator: act8865: support output voltage by VSET2[] bits" to the regulator tree"
In reply to: Thomas Gleixner: "Re: [PATCH v7 02/11] task_isolation: add initial support"
Next in thread: Thomas Gleixner: "Re: [PATCH v7 02/11] task_isolation: add initial support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]