Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
From: Ingo Molnar
Date: Thu Apr 06 2017 - 07:02:36 EST
* Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Thu, 6 Apr 2017, Ingo Molnar wrote:
>
> > Sorry if this is a back and forth - I was somehow convinced that we do need to
> > frob the cpus_allowed mask to get this functionality - but in hindsight I
> > think the counter should be enough.
> >
> > I.e. just have a counter and these two APIs:
> >
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> > }
> >
> > ...
> >
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> >
> > ... and make sure the scheduler migration code plus the CPU hotplug code considers
> > the counter.
>
> We tried that some time ago, but there are a lot of places in the scheduler
> which just rely on the cpus_allowed_mask, so we have to chase all of them and
> make sure that new users do not ignore that counter. That's why we chose the
> cpus_allowed_mask approach. And I still think that's the proper thing to do.
But but ...
The number of places in the scheduler where we actually end up migrating a task is
pretty limited:
try_to_wake_up():
- main wakeup code
migrate_swap():
- active NUMA-balancing feature
move_queued_task():
- hotplug CPU-down migration
- changing the affinity mask
The wakeup and NUMA balancing case is trivial to solve: it's an optimization and
we can skip the migration if migration is disabled.
CPU hotplug and changing the affinity mask are the more complex cases, because
there migrating or not migrating is a correctness issue:
- CPU hotplug has to be aware of this anyway, regardless of whether it's solved
via a counter of the affinity mask.
- Changing the affinity mask (set_cpus_allowed()) has two main cases:
the synchronous and asynchronous case:
- synchronous is when the current task changes its own affinity mask, this
should work fine mostly out of box, as we don't call set_cpus_allowed()
from inside migration disabled regions. (We can enforce this via a
debugging check.)
- The asynchronous case is when the affinity task of some other task is
changed - this would not have an immediate effect with migration-disabled
logic, the migration would be delayed to when migration is re-enabled
again.
As for general fragility, is there any reason why a simple debugging check in
set_task_cpu() would not catch most mishaps:
WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled);
... or something like that?
I.e. my point is that I think using a counter would be much simpler, yet still as
robust and maintainable. We could in fact move migrate_disable()/enable() upstream
straight away and eliminate this small fork of functionality between mainline and
-rt.
Thanks,
Ingo