Re: [RFC PATCH 1/2] sched/tracing: Don't re-read p->state when emitting sched_switch event

From: Steven Rostedt
Date: Wed Dec 08 2021 - 15:12:18 EST


On Mon, 29 Nov 2021 12:36:00 +0000
Valentin Schneider <valentin.schneider@xxxxxxx> wrote:

> As of commit
>
> c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
>
> the following sequence becomes possible:
>
> p->__state = TASK_INTERRUPTIBLE;
> __schedule()
> deactivate_task(p);
> ttwu()
> READ !p->on_rq
> p->__state=TASK_WAKING
> trace_sched_switch()
> __trace_sched_switch_state()
> task_state_index()
> return 0;
>
> TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
> the trace event.
>
> Prevent this by pushing the value read from __schedule() down the trace
> event.
>
> Reported-by: Abhijeet Dharmapurikar <adharmap@xxxxxxxxxxx>
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> ---
> include/linux/sched.h | 11 ++++++++---
> include/trace/events/sched.h | 11 +++++++----
> kernel/sched/core.c | 4 ++--
> kernel/trace/fgraph.c | 4 +++-
> kernel/trace/ftrace.c | 4 +++-
> kernel/trace/trace_events.c | 8 ++++++--
> kernel/trace/trace_sched_switch.c | 1 +

I believe you may have missed some functions that register the sched_switch
event. Do a git grep on register_trace_sched_switch.

-- Steve


> 7 files changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d2e261adb8ea..d00837d12b9d 100644