Re: [PATCHv4 00/12] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems
From: Vincent Guittot
Date: Tue Jul 31 2018 - 08:11:56 EST
On Mon, 9 Jul 2018 at 17:08, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
>
> On Fri, Jul 06, 2018 at 12:18:27PM +0200, Vincent Guittot wrote:
> > Hi Morten,
> >
> > On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
> > >
> > > On asymmetric cpu capacity systems (e.g. Arm big.LITTLE) it is crucial
> > > for performance that cpu intensive tasks are aggressively migrated to
> > > high capacity cpus as soon as those become available. The capacity
> > > awareness tweaks already in the wake-up path can't handle this as such
> > > tasks might run or be runnable forever. If they happen to be placed on a
> > > low capacity cpu from the beginning they are stuck there forever while
> > > high capacity cpus may have become available in the meantime.
> > >
> > > To address this issue this patch set introduces a new "misfit"
> > > load-balancing scenario in periodic/nohz/newly idle balance which tweaks
> > > the load-balance conditions to ignore load per capacity in certain
> > > cases. Since misfit tasks are commonly running alone on a cpu, more
> > > aggressive active load-balancing is needed too.
> > >
> > > The fundamental idea of this patch set has been in Android kernels for a
> > > long time and is absolutely essential for consistent performance on
> > > asymmetric cpu capacity systems.
> > >
> >
> > As already said , I'm not convinced by the proposal which seems quite
> > complex and also adds some kind of arbitrary and fixed power
> > management policy by deciding which tasks can or not go on big cores
> > whereas there are other frameworks to take such decision like EAS or
> > cgroups.
>
> The misfit patches are a crucial part of the EAS solution but they also
EAS needs the scheduler to move long running task on big core
(especially when overloaded), and the misfit task is just one proposal
> make sense for some users on their own without an energy model. This is
> why they are posted separately.
>
> We have already discussed at length why the patches are needed and why
> the look like they do here in this thread:
>
> https://lore.kernel.org/lkml/CAKfTPtD4skW_3SAk--vBEC5-m1Ua48bjOQYS0pDqW3nPSpsENg@xxxxxxxxxxxxxx/
>
> > Furthermore, there is already something similar in the kernel
> > with SD_ASYM_PACKING and IMO, it would be better to improve this
> > feature (if needed) instead of adding a new one which often do similar
> > things.
>
> As said in the previous thread, while it might look similar it isn't.
> SD_ASYM_PACKING isn't utilization-based which is the key metric used for
> EAS, schedutil, util_est, and util_clamp. SD_ASYM_PACKING serves a
> different purpose (see previous thread for details).
>
> > I have rerun your tests and got same results than misfit task patchset
> > on my hikey960 with SD_ASYM_PACKING feature for legacy b.L topology
> > and fake dynamiQ topology. And it give better performance when the
> > pinned tasks are short and scheduler has to wait for the task to
> > increase their utilization before getting a chance to migrate on big
> > core.
>
> Right, the test cases are quite simple and could be served better by
> SD_ASYM_PACKING. As we already discussed in that thread, that is due to
> the PELT lag but this the cost we have to pay if we don't have
> additional information about the requirements of the task and we don't
> want to default to big-first with all its implications.
>
> We have covered all this in the thread in early April.
>
> > Then, I have tested SD_ASYM_PACKING with EAS patchset and they work
> > together for b/L and dynamiQ topology
>
> Could you provide some more details about your evaluation? It probably
> works well for some use-cases but it isn't really designed for what we
> need for EAS.
>
> Morten