Re: [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending()

From: John Stultz
Date: Wed Mar 20 2024 - 18:42:51 EST


On Tue, Mar 19, 2024 at 10:24 PM Vishal Chourasia <vishalc@xxxxxxxxxxxxx> wrote:
> On Mon, Mar 18, 2024 at 12:28:33PM -0700, John Stultz wrote:
> > Zimuzo reported seeing occasional cases in perfetto traces where
> > tasks went from sleeping directly to trace_sched_wakeup()
> > without always seeing a trace_sched_waking().
> >
> > Looking at the code, trace_sched_wakeup() is only called in
> > ttwu_do_wakeup()
> >
> > The call paths that get you to ttwu_do_wakeup() are:
> > try_to_wake_up() -> ttwu_do_wakeup()
> > try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup()
> > try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup()
> > sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup()
> Notably, sched_ttwu_pending() is invoked for remote wakeups.
>
> Given this, I anticipate a scenario similar to the following
> occurred: When a process (P) is to be awakened on a remote CPU,
> the scheduler adds P to the remote CPU's wakelist,a per-CPU queue,
> and sends an IPI to the remote CPU. This action triggers
> sched_ttwu_pending() on the remote CPU, which then processes the
> wakelist and wakes up the queued processes.

Hey! Thanks for taking a look here and sharing this feedback.

Indeed, looking closer here I do have a hard time seeing how we can
get to sched_ttwu_pending() without having gone through the paths in
try_to_wake_up() that would call trace_sched_wakeup():
try_to_wake_up()->[ttwu_queue ->]
ttwu_queue_wakelist()->sched_ttwu_pending()->...

> In this scenario, the "waking trace" of P, signifying the initiation
> of the wake-up, is recorded on the CPU where try_to_wake_up was executed.
> Meanwhile, the "wakeup trace," denoting the completion of the wake-up,
> is observed on the remote CPU where sched_ttwu_pending() is executed.
>
> Is there a possibility that something other than the above occurred
> in your case?

I suspect that the case reported may be that the task went to sleep
right after the trace_sched_waking(). Which could result in the
transition from sleeping to trace_sched_wakeup() without the
trace_sched_waking() inbetween. The added latency from the remote
wakeup probably increases the chance for this race to occur.

So I'll withdraw this change for now and sync back up to check if my
suspicions are correct or not.

Thanks so much again for pointing this out.
-john