Re: [PATCH v4] sched_ext: Trigger ops.update_idle() from pick_task_idle()

From: Andrea Righi
Date: Tue Oct 15 2024 - 11:32:05 EST


On Tue, Oct 15, 2024 at 04:41:26AM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> On Tue, Oct 15, 2024 at 01:15:39PM +0200, Andrea Righi wrote:
> ...
> > For example, a BPF scheduler might use logic like the following to keep
> > the CPU active under specific conditions:
> >
> > void BPF_STRUCT_OPS(sched_update_idle, s32 cpu, bool idle)
> > {
> > if (!idle)
> > return;
> > if (condition)
> > scx_bpf_kick_cpu(cpu, 0);
> > }
> >
> > A call to scx_bpf_kick_cpu() wakes up the CPU, so in theory,
> > ops.update_idle() should be triggered again until the condition becomes
> > false. However, this doesn't happen, and scx_bpf_kick_cpu() doesn't
> > produce the expected effect.
>
> I thought more about this scenario and I'm not sure anymore whether we want
> to guarantee that scx_bpf_kick_cpu() is followed by update_idle(cpu, true).
> Here are a couple considerations:
>
> - As implemented, the transtions aren't balanced. ie. When the above
> happens, update_idle(cpu, true) will be generated multiple times without
> intervening update_idle(cpu, false). We can insert artificial false
> transtions but that's cumbersome and...

Agreed, I wouldn't suggest adding artificial false events.

>
> - For the purpose of determining whether a CPU is idle for e.g. task
> placement from ops.select_cpu(). The CPU *should* be considered idle in
> this polling state.
>
> Overall, it feels a bit contrived to generate update_idle() events
> consecutively for this. If a scheduler wants to poll in idle state, can't it
> do something like the following?
>
> - Trigger kick from update_idle(cpu, true) and remember that the CPU is in
> the polling state.
>
> - Keep kicking from ops.dispatch() until polling state is cleared.
>
> As what kick() guarnatees is at least one dispatch event after kicking, this
> is guaranteed to be correct and the control flow, while a bit more
> complicated, makes sense - it triggers dispatch on idle transition and keeps
> dispatching in the idle state.
>
> What do you think?

That seems to work in theory, I'll run some tests to confirm that it
also works in practice. :)

It looks definitely nicer than triggering multiple ops.update_idle()
from the kernel and we can maintain the proper semantic of triggering
update_idle() only on actual idle state changes.

Thanks,
-Andrea