Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task

From: Marcelo Tosatti
Date: Tue Oct 13 2020 - 13:14:19 EST


On Thu, Oct 08, 2020 at 09:54:44PM +0200, Frederic Weisbecker wrote:
> On Thu, Oct 08, 2020 at 02:54:09PM -0300, Marcelo Tosatti wrote:
> > On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote:
> > > > When adding a tick dependency to a task, its necessary to
> > > > wakeup the CPU where the task resides to reevaluate tick
> > > > dependencies on that CPU.
> > > >
> > > > However the current code wakes up all nohz_full CPUs, which
> > > > is unnecessary.
> > > >
> > > > Switch to waking up a single CPU, by using ordering of writes
> > > > to task->cpu and task->tick_dep_mask.
> > > >
> > > > From: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > > Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > > >
> > > > Index: linux-2.6/kernel/time/tick-sched.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/kernel/time/tick-sched.c
> > > > +++ linux-2.6/kernel/time/tick-sched.c
> > > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu)
> > > > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> > > > }
> > > >
> > > > +static void tick_nohz_kick_task(struct task_struct *tsk)
> > > > +{
> > > > + int cpu = task_cpu(tsk);
> > > > +
> > > > + /*
> > > > + * If the task concurrently migrates to another cpu,
> > > > + * we guarantee it sees the new tick dependency upon
> > > > + * schedule.
> > > > + *
> > > > + *
> > > > + * set_task_cpu(p, cpu);
> > > > + * STORE p->cpu = @cpu
> > > > + * __schedule() (switch to task 'p')
> > > > + * LOCK rq->lock
> > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask
> > > > + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or())
> > > > + * LOAD p->tick_dep_mask LOAD p->cpu
> > > > + */
> > > > +
> > > > + preempt_disable();
> > > > + if (cpu_online(cpu))
> > > > + tick_nohz_full_kick_cpu(cpu);
> > > > + preempt_enable();
> > > > +}
> > >
> > > So we need to kick the CPU unconditionally, or only when the task is
> > > actually running? AFAICT we only care about current->tick_dep_mask.
> >
> > tick is necessary to execute run_posix_cpu_timers, from tick interrupt,
> > even if task is not running.
>
> Yes but if the task isn't running, run_posix_cpu_timers() doesn't have
> anything to elapse. So indeed we can spare the IPI if the task is not
> running. Provided ordering makes sure that the task sees the new dependency
> when it schedules in of course.

True.

* p->on_cpu <- { 0, 1 }:
*
* is set by prepare_task() and cleared by finish_task() such that it will be
* set before p is scheduled-in and cleared after p is scheduled-out, both
* under rq->lock. Non-zero indicates the task is running on its CPU.


CPU-0 (tick_set_dep) CPU-1 (task switch)

STORE p->tick_dep_mask
smp_mb() (atomic_fetch_or())
LOAD p->on_cpu


context_switch(prev, next)
STORE next->on_cpu = 1
... [*]

LOAD current->tick_dep_mask


Don't see any explicit memory barrier in the [*] section?