Re: sched_ext/for-6.11: cpu validity check in ops_cpu_valid

From: Vishal Chourasia
Date: Tue Jul 16 2024 - 02:49:43 EST


On Sun, Jul 14, 2024 at 07:17:32PM -1000, Tejun Heo wrote:
> Hello, Vishal.
>
> On Sun, Jul 14, 2024 at 12:44:24AM +0530, Vishal Chourasia wrote:
> > Currently, the BPF scheduler can return a CPU that is marked as possible
> > in the system configurations, but this doesn't guarantee that the CPU is
> > actually present or online at the time. This behavior can lead to
> > scenarios where the scheduler attempts to assign tasks to CPUs that are
> > not available, causing the fallback mechanism to activate and
> > potentially leading to an uneven load distribution across the system.
>
> ops.select_cpu() is allowed to return any CPU and then the scheduler will
> pick a fallback CPU. This is mostly because that's how
> sched_class->select_task_rq() behaves. Here, SCX is just inheriting the
> behavior.
>
> Dispatching to foreign local DSQ using SCX_DSQ_LOCAL_ON also does
> auto-fallback. This is because it's difficult for the BPF scheduler to
> strongly synchronize its dispatch operation against CPU hotplug operations.
>
> > By defalut, When a "not possible" CPU is returned, sched_ext gracefully
> > exits the bpf scheduler.
> >
> > static bool ops_cpu_valid(s32 cpu, const char *where)
> > {
> > if (likely(cpu >= 0 && cpu < nr_cpu_ids && cpu_possible(cpu))) {
> > return true;
> > } else {
> > scx_ops_error("invalid CPU %d%s%s", cpu,
> > where ? " " : "", where ?: "");
> > return false;
> > }
> > }
> >
> > On POWER, a system can have differences in cpu_present and cpu_possible
> > mask. Not present, but possible CPUs can be added later but once added
> > will also be marked set in the cpu present mask.
> >
> > Looks like cpu_present() is a better check.
>
> We can consider tightening each path separately but I'm not sure making
What do you mean by "each path separately"?

> ops_cpu_valid() more strict is a good idea. For example, there's no reason
> to abort because a scheduler is calling scx_bpf_dsq_nr_queued() on an
> offline CPU especially given that the kfunc is allowed from any context
> without any synchronization. It can create aborting bugs which are really
> difficult to reproduce.
I agree, I wouldn't want to kick the BPF scheduler out for things that
can be handled. If an invalid CPU was returned by any sched_class, it's
best to handle it, because we don't have any other option.

However, the case of the BPF scheduler is different; we shouldn't need
to handle corner cases but instead immediately flag such cases.

Consider this: if a BPF scheduler is returning a non-present CPU in
select_cpu, the corresponding task will get scheduled on a CPU (using
the fallback mechanism) that may not be the best placement, causing
inconsistent behavior. And there will be no red flags reported making it
difficult to catch. My point is that sched_ext should be much stricter
towards the BPF scheduler.

Note: There is still the case when a offline CPU is returned by the bpf
scheduler. If sched_ext can catch that and handle it seperately by
calling scx_bpf_select_cpu_dfl

>
> Thanks.
>
> --
> tejun