Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT
From: Eric W. Biederman
Date: Mon Jan 17 2022 - 14:13:07 EST
Valentin Schneider <valentin.schneider@xxxxxxx> writes:
> TASK_RTLOCK_WAIT currently isn't part of TASK_REPORT, thus a task blocking
> on an rtlock will appear as having a task state == 0, IOW TASK_RUNNING.
>
> The actual state is saved in p->saved_state, but reading it after reading
> p->__state has a few issues:
> o that could still be TASK_RUNNING in the case of e.g. rt_spin_lock
> o ttwu_state_match() might have changed that to TASK_RUNNING
>
> Add TASK_RTLOCK_WAIT to TASK_REPORT.
>
> Reported-by: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx>
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> Reviewed-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> ---
> fs/proc/array.c | 3 ++-
> include/linux/sched.h | 17 +++++++++--------
> include/trace/events/sched.h | 1 +
> 3 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index ff869a66b34e..f4cae65529a6 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -128,9 +128,10 @@ static const char * const task_state_array[] = {
> "X (dead)", /* 0x10 */
> "Z (zombie)", /* 0x20 */
> "P (parked)", /* 0x40 */
> + "L (rt-locked)", /* 0x80 */
>
> /* states beyond TASK_REPORT: */
> - "I (idle)", /* 0x80 */
> + "I (idle)", /* 0x100 */
> };
I think this is at least possibly an ABI break. I have a vague memory
that userspace is not ready being reported new task states. Which is
why we encode some of our states the way we do.
Maybe it was just someone being very conservative.
Still if you are going to add new states to userspace and risk breaking
them can you do some basic analysis and report what ps and similar
programs do.
Simply changing userspace without even mentioning that you are changing
the userspace output of proc looks dangerous indeed.
Looking in the history commit 74e37200de8e ("proc: cleanup/simplify
get_task_state/task_state_array") seems to best document the concern
that userspace does not know how to handle new states.
The fact we have had a parked state for quite a few years despite that
concern seems to argue it is possible to extend the states. Or perhaps
it just argues that parked states are rare enough it does not matter.
It is definitely the case that the ps manpage documents the possible
states and as such they could be a part of anyone's shell scripts.
>From the ps man page:
> Here are the different values that the s, stat and state output
> specifiers (header "STAT" or "S") will display to describe the
> state of a process:
>
> D uninterruptible sleep (usually IO)
> I Idle kernel thread
> R running or runnable (on run queue)
> S interruptible sleep (waiting for an event to complete)
> T stopped by job control signal
> t stopped by debugger during the tracing
> W paging (not valid since the 2.6.xx kernel)
> X dead (should never be seen)
> Z defunct ("zombie") process, terminated but not reaped by its parent
>
So it looks like a change that adds to the number of states in the
kernel should update the ps man page as well.
Eric