Re: [PATCH v2] sched: rt: Make RT capacity aware

From: Qais Yousef
Date: Mon Feb 03 2020 - 12:17:52 EST


On 02/03/20 11:14, Steven Rostedt wrote:
> On Mon, 3 Feb 2020 14:27:14 +0000
> Qais Yousef <qais.yousef@xxxxxxx> wrote:
>
> > I don't see one right answer here. The current mechanism could certainly do
> > better; but it's not clear what better means without delving into system
> > specific details. I am open to any suggestions to improve it.
>
> The way I see this is that if there's no big cores available but little
> cores are, and the RT task has those cores in its affinity mask then
> the task most definitely should consider moving to the little core. The
> cpu_find() should return them!

I almost agree. I think the cpupri_find() could certainly do better if the task
is already running on a little core. It can fallback to the next best little
core if no bigger core is available.

I already started looking at pushing a patch to do that.

I'm torn about pushing a task already on a big core to a little core if it says
it wants it (down migration).

I guess since most tasks are fifo by default then one will starve if the other
one is a long running task (assuming same priority). But long running RT tasks
are not the common case, hence I wanted to hear about what use case this logic
hurts. I expect by default the big cores not to be over subscribed. Based on
some profiles I did at least running real Android apps I didn't see the RT
tasks were overwhelming the system.

In my view, the performance dip of sharing the big core would be less than
migrating the task to a little core momentarily then bring it back in to the
big core.

Because the following 2 big ifs must be satisfied first to starve an RT task:

1. We need all the big cores to be overloaded first.
2. The RT tasks on all the big cores are CPU hoggers (however we want
to define this)

And I think this needs more investigation.

>
> But, what we can do is to mark the little core that's running an RT
> task on a it that prefers bigger cores, as "rt-overloaded". This will
> add this core into the being looked at when another core schedules out
> an RT task. When that happens, the RT task on the little core will get
> pulled back to the big core.

I didn't think of using the rt-overloaded flag in this way. That would be
interesting to try.

>
> Here's what I propose.
>
> 1. Scheduling of an RT task that wants big cores, but has little cores
> in its affinity.
>
> 2. Calls cpu_find() which will look to place it first on a big core, if
> there's a core that is running a task that is lower priority than
> itself.
>
> 3. If all the big cores have RT tasks it can not preempt, look to find
> a little core.

I agree with the above.

>
> 4. If a little core is returned, and we schedule an RT task that
> prefers big cores on it, we mark it overloaded.
>
> 5. An RT task on a big core schedules out. Start looking at the RT
> overloaded run queues.
>
> 6. See that there's an RT task on the little core, and migrate it over.

I think the above should depend on the fitness of the cpu we currently run on.
I think we shouldn't down migrate, or at least investigate better down
migration makes more sense than keeping tasks running on the correct CPU where
they are.

> Note, this will require a bit more logic as the overloaded code wasn't
> designed for migration of running tasks, but that could be added.

I'm wary of overloading the meaning of rt.overloaded. Maybe I can convert it to
a bitmap so that we can encode the reason.

Let me see how complicated to write something up.

Thanks!

--
Qais Yousef