Re: [PATCH v3 3/6] sched: Change wait_task_inactive()s match_state

From: Peter Zijlstra
Date: Wed Sep 07 2022 - 05:31:58 EST


On Tue, Sep 06, 2022 at 12:54:34PM +0200, Peter Zijlstra wrote:

> > Suggestion #3:
> >
> > - Couldn't the following users with a 0 mask:
> >
> > drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, 0);
> > fs/coredump.c: wait_task_inactive(ptr->task, 0);
> >
> > Use ~0 instead (exposed as TASK_ANY or so) and then we can drop the
> > !match_state special case?
> >
> > They'd do something like:
> >
> > drivers/powercap/idle_inject.c: wait_task_inactive(iit->tsk, TASK_ANY);
> > fs/coredump.c: wait_task_inactive(ptr->task, TASK_ANY);
> >
> > It's not an entirely 100% equivalent transformation though, but looks OK
> > at first sight: ->__state will be some nonzero mask for genuine tasks
> > waiting to schedule out, so any match will be functionally the same as a
> > 0 flag telling us not to check any of the bits, right? I might be missing
> > something though.
>
> I too am thinking that should work. Added patch for that.

---
Subject: sched: Add TASK_ANY for wait_task_inactive()
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Tue Sep 6 12:39:55 CEST 2022

Now that wait_task_inactive()'s @match_state argument is a mask (like
ttwu()) it is possible to replace the special !match_state case with
an 'all-states' value such that any blocked state will match.

Suggested-by: Ingo Molnar (mingo@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
drivers/powercap/idle_inject.c | 2 +-
fs/coredump.c | 2 +-
include/linux/sched.h | 2 ++
kernel/sched/core.c | 16 ++++++++--------
4 files changed, 12 insertions(+), 10 deletions(-)

--- a/drivers/powercap/idle_inject.c
+++ b/drivers/powercap/idle_inject.c
@@ -254,7 +254,7 @@ void idle_inject_stop(struct idle_inject
iit = per_cpu_ptr(&idle_inject_thread, cpu);
iit->should_run = 0;

- wait_task_inactive(iit->tsk, 0);
+ wait_task_inactive(iit->tsk, TASK_ANY);
}

cpu_hotplug_enable();
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -412,7 +412,7 @@ static int coredump_wait(int exit_code,
*/
ptr = core_state->dumper.next;
while (ptr != NULL) {
- wait_task_inactive(ptr->task, 0);
+ wait_task_inactive(ptr->task, TASK_ANY);
ptr = ptr->next;
}
}
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -101,6 +101,8 @@ struct task_group;
#define TASK_RTLOCK_WAIT 0x1000
#define TASK_STATE_MAX 0x2000

+#define TASK_ANY (TASK_STATE_MAX-1)
+
/* Convenience macros for the sake of set_current_state: */
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
#define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3254,12 +3254,12 @@ int migrate_swap(struct task_struct *cur
/*
* wait_task_inactive - wait for a thread to unschedule.
*
- * If @match_state is nonzero, it's the @p->state value just checked and
- * not expected to change. If it changes, i.e. @p might have woken up,
- * then return zero. When we succeed in waiting for @p to be off its CPU,
- * we return a positive number (its total switch count). If a second call
- * a short while later returns the same number, the caller can be sure that
- * @p has remained unscheduled the whole time.
+ * Wait for the thread to block in any of the states set in @match_state.
+ * If it changes, i.e. @p might have woken up, then return zero. When we
+ * succeed in waiting for @p to be off its CPU, we return a positive number
+ * (its total switch count). If a second call a short while later returns the
+ * same number, the caller can be sure that @p has remained unscheduled the
+ * whole time.
*
* The caller must ensure that the task *will* unschedule sometime soon,
* else this function might spin for a *long* time. This function can't
@@ -3295,7 +3295,7 @@ unsigned long wait_task_inactive(struct
* is actually now running somewhere else!
*/
while (task_on_cpu(rq, p)) {
- if (match_state && !(READ_ONCE(p->__state) & match_state))
+ if (!(READ_ONCE(p->__state) & match_state))
return 0;
cpu_relax();
}
@@ -3310,7 +3310,7 @@ unsigned long wait_task_inactive(struct
running = task_on_cpu(rq, p);
queued = task_on_rq_queued(p);
ncsw = 0;
- if (!match_state || (READ_ONCE(p->__state) & match_state))
+ if (READ_ONCE(p->__state) & match_state)
ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
task_rq_unlock(rq, p, &rf);