Re: [RFD] sched/deadline: Support single CPU affinity

From: luca abeni
Date: Thu Nov 10 2016 - 09:36:10 EST


Hi Henrik,

On Thu, 10 Nov 2016 13:56:35 +0100
Henrik Austad <henrik@xxxxxxxxx> wrote:

> On Thu, Nov 10, 2016 at 01:38:40PM +0100, luca abeni wrote:
> > Hi Henrik,
>
> Hi Luca,
>
> > On Thu, 10 Nov 2016 13:21:00 +0100
> > Henrik Austad <henrik@xxxxxxxxx> wrote:
> > > On Thu, Nov 10, 2016 at 09:08:07AM +0100, Peter Zijlstra wrote:
> > [...]
> > > > We define the time to fail as:
> > > >
> > > > ttf(t) := t_d - t_b; where
> > > >
> > > > t_d is t's absolute deadline
> > > > t_b is t's remaining budget
> > > >
> > > > This is the last possible moment we must schedule this task such
> > > > that it can complete its work and not miss its deadline.
> > >
> > > To elaborate a bit on this (this is a modified LLF approach if my
> > > memory serves):
> > >
> > > You have the dynamic time-to-failure (TtF), i.e. as the task
> > > progresses (scheduled to run), the relative time-to-failure will
> > > remain constant. This can be used to compare thasks to a running
> > > task and should minimize the number of calculations required.
> > >
> > > Then you have the static Time-of-failure (ToF, which is the
> > > absoulte time when a task will no longer be able to meet its
> > > deadline. This is what you use for keeping a sorted list of tasks
> > > in the runqueue. As this is a fixed point in time, you do not
> > > have to dynamically update or do crazy calculation when
> > > inserting/removing threads from the rq.
> >
> > Sorry, I am missing something here: if ttf is defined as
> > ttf_i = d_i - q_i
>
> So I picked the naming somewhat independently of Peter, his approach
> is the _absolute_ time of failure, the actual time X, irrespective of
> the task running or not.
>
> I added 2 different measures for the same thing:
>
> * ToF:
> The absolute time of failure is the point in time when the task will
> no longer be able to meet its deadline. If a task is scheduled and is
> running on a CPU, this value will move forward at the speed of
> execution. i.e. when the task is running, this value is changing.
> When the task is waiting in the runqueue, this value is constant.
Ah, ok... So, if I understand well you ToF is Peter's ttf... Right?


> TtF:
> The relative time to failure is the value that is tied to the local
> CPU so to speak. When a task is running, this value is constant as it
> is the remaining time until the task is no longer able to meet its
> deadline. When the task is enqueued, this value will steadily
> decrease as it draws closer to the time when it will fail.
So, if I understand weel, TtF = ToF - current time... Right? I think
this is often called "laxity" or "slack time". No?

[...]
> > > > If we then augment the regular EDF rules by, for local tasks,
> > > > considering the time to fail and let this measure override the
> > > > regular EDF pick when the time to fail can be overran by the EDF
> > > > pick.
> > >
> > > Then, if you do this - do you need to constrict this to a local
> > > CPU? I *think* you could do this in a global scheduler if you use
> > > ToF/TtF for all deadline-tasks, I think you should be able to
> > > meet deadlines.
> > I think the ToF/TtF scheduler will be equivalent to LLF (see
> > previous emails)... Or am I misunderstanding something? (see above)
> > And LLF is not optimal on multiple CPUs, so I do not think it will
> > be able to meet deadlines if you use it as a global scheduler.
>
> I think I called it Earliest Failure First (I really wanted to call
> it failure-driven scheduling but that implied a crappy scheduler ;)
Ok, but... How is it different from LLF?
In my understanding (but, again, maybe I am missing something) ToF and
TtF are just a way to implement LLF more efficiently (because, as you
notice, ToF does not change when a task is executing and TtF does not
change when the task is executing).


Luca



> LLF is prone to high task-switch count when multiple threads gets
> close to 0 laxity. But as I said, it's been a while since I last
> worked through the theory, so I have some homework to do before
> arguing too hard about this.
>
> > > I had a rant about this way back [1,2 Sec 11.4], I need to sit
> > > down and re-read most of it, it has been a few too many years,
> > > but the idea was to minimize the number of context-switches
> > > (which LLF is prone to get a lot of) as well as minimize the
> > > computational overhead by avoiding re-calculating
> > > time-of-failre/time-to-failre a lot.
> > > > That is, when:
> > > >
> > > > now + left_b > min(ttf)
> > >
> > > Why not just use ttf/tof for all deadline-tasks? We have all the
> > > information available anyway and it would probably make the
> > > internal logic easier?
> > I think LLF causes more preemptions and migrations than EDF.
>
> yes, it does, which is why you need to adjust LLF to minimize the
> number of task-switches.
>
> -Henrik