Re: [PATCH v3 3/6] sched: Change wait_task_inactive()s match_state

From: Ingo Molnar
Date: Sun Sep 04 2022 - 06:45:01 EST



* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> Make wait_task_inactive()'s @match_state work like ttwu()'s @state.
>
> That is, instead of an equal comparison, use it as a mask. This allows
> matching multiple block conditions.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> kernel/sched/core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct
> * is actually now running somewhere else!
> */
> while (task_running(rq, p)) {
> - if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> + if (match_state && !(READ_ONCE(p->__state) & match_state))
> return 0;

We lose the unlikely annotation there - but I guess it probably never
really mattered anyway?

Suggestion #1:

- Shouldn't we rename task_running() to something like task_on_cpu()? The
task_running() primitive is similar to TASK_RUNNING but is not based off
any TASK_FLAGS.

Suggestion #2:

- Shouldn't we eventually standardize on task->on_cpu on UP kernels too?
They don't really matter anymore, and doing so removes #ifdefs and makes
the code easier to read.


> cpu_relax();
> }
> @@ -3310,7 +3310,7 @@ unsigned long wait_task_inactive(struct
> running = task_running(rq, p);
> queued = task_on_rq_queued(p);
> ncsw = 0;
> - if (!match_state || READ_ONCE(p->__state) == match_state)
> + if (!match_state || (READ_ONCE(p->__state) & match_state))
> ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
> task_rq_unlock(rq, p, &rf);

Suggestion #3:

- Couldn't the following users with a 0 mask:

drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, 0);
fs/coredump.c: wait_task_inactive(ptr->task, 0);

Use ~0 instead (exposed as TASK_ANY or so) and then we can drop the
!match_state special case?

They'd do something like:

drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, TASK_ANY);
fs/coredump.c: wait_task_inactive(ptr->task, TASK_ANY);

It's not an entirely 100% equivalent transformation though, but looks OK
at first sight: ->__state will be some nonzero mask for genuine tasks
waiting to schedule out, so any match will be functionally the same as a
0 flag telling us not to check any of the bits, right? I might be missing
something though.

Thanks,

Ingo