Re: [PATCH v2] sched/fair: Make SCHED_IDLE entity be preempted in strict hierarchy

From: Josh Don
Date: Tue Jul 09 2024 - 14:29:14 EST


On Mon, Jul 8, 2024 at 7:28 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jul 08, 2024 at 02:47:34PM +0200, Vincent Guittot wrote:
> > On Mon, 8 Jul 2024 at 14:02, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
<snip>
> > > The 'problem' is that the whole BATCH thing isn't cgroup aware ofcourse,
> > > but I'm not sure we want to go fix that -- esp. not in this patch.
> > >
> > > Hmm?
> >
> > Good question, but do we want to make SCHED_BATCH tasks behave
> > differently than SCHED_IDLE tasks in a group in this case ?
>
> I suspect we'll have to. People added the idle-cgroup thing, but never
> did the same for batch. With the result that they're now fundamentally
> different.

It isn't clear to me that cgroup batch behavior is really a useful
thing that is worth adding. After the EEVDF changes, the only real
difference between normal and batch is that batch don't preempt normal
on wakeup. Contrast that to idle, where we have a pretty meaningful
difference from sched_normal, especially with sched_idle_cpu feeding
into wakeup placement and load balancing.

Happy to be proven wrong if there's a use case for batch wherein the
wakeup preempt behavior is useful at the group level as well. Honestly
it feels like it would make sense to revisit the cgroup batch question
when/if additional behaviors were added to further differentiate
batch. For example, maybe a batch cgroup hierarchy could internally
use longer slices and have a slower round-robin preemption rate
amongst its processes. The wakeup bit alone is limited, and the
supposed target workload of low-priority cpu intensive threads are
unlikely to have many wakeup edges anyway.