Re: [EXT] Re: [PATCH 03/12] task_isolation: userspace hard isolation from kernel
From: Alex Belits
Date: Sun Mar 08 2020 - 01:07:16 EST
On Fri, 2020-03-06 at 16:26 +0100, Frederic Weisbecker wrote:
> On Wed, Mar 04, 2020 at 04:07:12PM +0000, Alex Belits wrote:
> > do {
> > + /* Make sure we are reading up to date values
> > */
> > + smp_mb();
> > + ind = ind_counter & 1;
> > + snprintf(buf_prefix, sizeof(buf_prefix),
> > + "isolation %s/%d/%d (cpu %d)",
> > + desc->comm[ind], desc->tgid[ind],
> > + desc->pid[ind], cpu);
> > + desc->warned[ind] = true;
> > + ind_counter_old = ind_counter;
> > + /* Record the warned flag, then re-read
> > descriptor */
> > + smp_mb();
> > + ind_counter = atomic_read(&desc->curr_index);
> > + /*
> > + * If the counter changed, something was
> > updated, so
> > + * repeat everything to get the current data
> > + */
> > + } while (ind_counter != ind_counter_old);
> > + }
>
> So the need to log the fact we are sending an event to a remote CPU
> that *may be*
> running an isolated task makes things very complicated and even racy.
The only reason why the result of this would be wrong, is the race
between multiple causes of breaking isolation of the same task or race
with the task exiting isolation on its own at the same time (and
possibly re-entering it, or even another task entering on the same CPU
core). This is possible, however for all practical purposes we are
still logging an isolation-breaking event that happened while a real
isolated task was running. We should keep in mind the possibility that
this isolation-breaking event could be preempted by another isolation
breaking cause, and all of them will be recorded even if only one ended
up causing fast_task_isolation_cpu_cleanup() to be called on the target
CPU core.
> How bad would it be to only log those interruptions once they land on
> the target?
>
For the purpose of determining the cause of isolation breaking -- very
bad. Early versions of this made people tear their hair out trying to
divine, where some IPI came from. Then there was a monstrosity that did
some rather unsafe manipulations with task_struct, however it was only
suitable as a temporary mechanism for development. This version keeps
things consistent and only shows up when there is something that should
be reported.
> Thanks.
Thanks!
--
Alex