Re: [PATCH 07/10] sched_ext: Introduce per-node idle cpumasks

From: Yury Norov
Date: Tue Dec 24 2024 - 12:59:44 EST


On Tue, Dec 24, 2024 at 09:18:28AM +0100, Andrea Righi wrote:
> On Mon, Dec 23, 2024 at 08:05:53PM -0800, Yury Norov wrote:
> > On Fri, Dec 20, 2024 at 04:11:39PM +0100, Andrea Righi wrote:

...

> > > + * cpumasks to track idle CPUs within each NUMA node.
> > > + *
> > > + * If SCX_OPS_BUILTIN_IDLE_PER_NODE is not specified, a single flat cpumask
> > > + * from node 0 is used to track all idle CPUs system-wide.
> > > + */
> > > +static struct idle_cpumask **scx_idle_masks;
> > > +
> > > +static struct idle_cpumask *get_idle_mask(int node)
> >
> > Didn't we agree to drop this 'get' thing?
>
> Hm... no? :)
>
> The analogy you pointed out was with get_parity8() which implements a pure
> function, so we should just use parity8() as "get" is implicit.
>
> This case is a bit different IMHO, because it's getting a reference to the
> object (so not a pure function) and we may even have a put_idle_mask()
> potentially.
>
> Personally I like to have the "get" here, but if you think it's confusing
> or it makes the code less readable I'm ok to drop it.

OK, whatever

...

> > > + * Find the best idle CPU in the system, relative to @node.
> > > + *
> > > + * If @node is NUMA_NO_NODE, start from the current node.
> > > + */
> >
> > And if you don't invent this rule for kernel users, you don't need to
> > explain it everywhere.
>
> I think we mentioned treating NUMA_NO_NODE as current node, but I might
> have misunderstood.

That's for userspace to maybe save a syscall. For in-kernel users it's
unneeded for sure.

> In an earlier patch set I was just ignoring
> NUMA_NO_NODE. Should we return an error instead?

Kernel users will never give you NUMA_NO_NODE if you ask them not to
do that. Resolving NO_NODE to current node for kernel users is useless.
They can call numa_node_id() just as well.

Userspace can do everything, so you have to check what they pass. For
userspace, it's up to you either to resolve NO_NODE to current node,
random node, simulate disabled per-numa idlemasks, do anything else or
return an error.