Re: [PATCH RFC v2 tip/core/rcu 01/22] sched/core: Add function to sample state of locked-down task

From: Steven Rostedt
Date: Thu Mar 19 2020 - 13:22:44 EST


On Wed, 18 Mar 2020 17:10:39 -0700
paulmck@xxxxxxxxxx wrote:

> From: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
>
> A running task's state can be sampled in a consistent manner (for example,
> for diagnostic purposes) simply by invoking smp_call_function_single()
> on its CPU, which may be obtained using task_cpu(), then having the
> IPI handler verify that the desired task is in fact still running.
> However, if the task is not running, this sampling can in theory be done
> immediately and directly. In practice, the task might start running at
> any time, including during the sampling period. Gaining a consistent
> sample of a not-running task therefore requires that something be done
> to lock down the target task's state.
>
> This commit therefore adds a try_invoke_on_locked_down_task() function
> that invokes a specified function if the specified task can be locked
> down, returning true if successful and if the specified function returns
> true. Otherwise this function simply returns false. Given that the
> function passed to try_invoke_on_nonrunning_task() might be invoked with
> a runqueue lock held, that function had better be quite lightweight.
>
> The function is passed the target task's task_struct pointer and the
> argument passed to try_invoke_on_locked_down_task(), allowing easy access
> to task state and to a location for further variables to be passed in
> and out.
>
> Note that the specified function will be called even if the specified
> task is currently running. The function can use ->on_rq and task_curr()
> to quickly and easily determine the task's state, and can return false
> if this state is not to the function's liking. The caller of teh

s/teh/the/

> try_invoke_on_locked_down_task() would then see the false return value,
> and could take appropriate action, for example, trying again later or
> sending an IPI if matters are more urgent.
>
> It is expected that use cases such as the RCU CPU stall warning code will
> simply return false if the task is currently running. However, there are
> use cases involving nohz_full CPUs where the specified function might
> instead fall back to an alternative sampling scheme that relies on heavier
> synchronization (such as memory barriers) in the target task.
>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> [ paulmck: Apply feedback from Peter Zijlstra and Steven Rostedt. ]
> [ paulmck: Invoke if running to handle feedback from Mathieu Desnoyers. ]
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> Cc: Ben Segall <bsegall@xxxxxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxx>
> ---
> include/linux/wait.h | 2 ++
> kernel/sched/core.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 3283c8d..e2bb8ed 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -1148,4 +1148,6 @@ int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, i
> (wait)->flags = 0; \
> } while (0)
>
> +bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg);
> +
> #endif /* _LINUX_WAIT_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fc1dfc0..195eba0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2580,6 +2580,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> *
> * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> * __schedule(). See the comment for smp_mb__after_spinlock().
> + *
> + * A similar smb_rmb() lives in try_invoke_on_locked_down_task().
> */
> smp_rmb();
> if (p->on_rq && ttwu_remote(p, wake_flags))
> @@ -2654,6 +2656,52 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> }
>
> /**
> + * try_invoke_on_locked_down_task - Invoke a function on task in fixed state
> + * @p: Process for which the function is to be invoked.
> + * @func: Function to invoke.
> + * @arg: Argument to function.
> + *
> + * If the specified task can be quickly locked into a definite state
> + * (either sleeping or on a given runqueue), arrange to keep it in that
> + * state while invoking @func(@arg). This function can use ->on_rq and
> + * task_curr() to work out what the state is, if required. Given that
> + * @func can be invoked with a runqueue lock held, it had better be quite
> + * lightweight.
> + *
> + * Returns:
> + * @false if the task slipped out from under the locks.
> + * @true if the task was locked onto a runqueue or is sleeping.
> + * However, @func can override this by returning @false.

Should probably state that it will return false if the state could be
locked, otherwise it returns the return code of the function.

I'm wondering if we shouldn't have the function return code be something
passed in by the parameter, and have this return either true (locked and
function called), or false (not locked and function wasn't called).


> + */
> +bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
> +{
> + bool ret = false;
> + struct rq_flags rf;
> + struct rq *rq;
> +
> + lockdep_assert_irqs_enabled();
> + raw_spin_lock_irq(&p->pi_lock);
> + if (p->on_rq) {
> + rq = __task_rq_lock(p, &rf);
> + if (task_rq(p) == rq)
> + ret = func(p, arg);
> + rq_unlock(rq, &rf);
> + } else {
> + switch (p->state) {
> + case TASK_RUNNING:
> + case TASK_WAKING:
> + break;
> + default:

Don't we need a comment here about why we have a rmb() and where the
matching wmb() is?

-- Steve

> + smp_rmb();
> + if (!p->on_rq)
> + ret = func(p, arg);
> + }
> + }
> + raw_spin_unlock_irq(&p->pi_lock);
> + return ret;
> +}
> +
> +/**
> * wake_up_process - Wake up a specific process
> * @p: The process to be woken up.
> *