Re: [EXT] Re: [PATCH 03/12] task_isolation: userspace hard isolation from kernel

From: Alex Belits
Date: Sun Mar 08 2020 - 03:17:07 EST


On Fri, 2020-03-06 at 17:00 +0100, Frederic Weisbecker wrote:
> On Wed, Mar 04, 2020 at 04:07:12PM +0000, Alex Belits wrote:
> > +#ifdef CONFIG_TASK_ISOLATION
> > +int try_stop_full_tick(void)
> > +{
> > + int cpu = smp_processor_id();
> > + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> > +
> > + /* For an unstable clock, we should return a permanent error
> > code. */
> > + if (atomic_read(&tick_dep_mask) & TICK_DEP_MASK_CLOCK_UNSTABLE)
> > + return -EINVAL;
> > +
> > + if (!can_stop_full_tick(cpu, ts))
> > + return -EAGAIN;
>
> Note that the stop_tick naming in nohz can be misleading. It means
> we actually leave the periodic mode and we enter in dynamic tick
> mode.
>
> In practice it means that the tick is delayed until the next event,
> which
> in the worst case may well be in 1 ms and in the best case never. So
> what
> you probably want to check instead is whether the tick has been
> entirely
> stopped (ie: we called hrtimer_cancel(&ts->sched_timer)).

This is a part of solution where libtmc in userspace checks for timers
from another core before it confirms that the core that is entering
isolation can continue. Since, indeed, it is possible that some events
are pending, it is up to userspace to tell the task that it's not
really isolated yet, should exit and re-enter isolation when everything
is done. Or that it will be too much of the wait, and it should be seen
as an error, reported, etc.

Maybe it would be better if we checked for timer state and returned
-EAGAIN when it is running at this point, and left userspace check for
those cases when this did not work due to some race and preemption.
However I still want to assume that as long as there is no complete
prohibition of scheduling things on isolated CPUs, there might be
things that will enable this timer at unexpected times while we are
returning to userspace or even immediately after we got into userspace.

>
> Thanks.
>
> > +
> > + tick_nohz_stop_sched_tick(ts, cpu);
> > + return 0;
> > +}
> > +#endif
> > +
> > static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
> > {
> > /*
> > --
> > 2.20.1
> >