Re: [patch 2/2] nohz: try to avoid IPI when setting tick dependency for task

From: Marcelo Tosatti
Date: Thu Sep 10 2020 - 14:53:54 EST


On Thu, Sep 03, 2020 at 05:01:53PM +0200, Frederic Weisbecker wrote:
> On Tue, Aug 25, 2020 at 03:41:49PM -0300, Marcelo Tosatti wrote:
> > When enabling per-CPU posix timers, an IPI to nohz_full CPUs might be
> > performed (to re-read the dependencies and possibly not re-enter
> > nohz_full on a given CPU).
> >
> > A common case is for applications that run on nohz_full= CPUs
> > to not use POSIX timers (eg DPDK).
> >
> > This patch optimizes tick_nohz_dep_set_task to avoid kicking
> > all nohz_full= CPUs in case the task allowed mask does not
> > intersect with nohz_full= CPU mask,
> > when going through tick_nohz_dep_set_task.
> >
> > This reduces interruptions to nohz_full= CPUs.
> >
> > ---
> > kernel/time/tick-sched.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6/kernel/time/tick-sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/time/tick-sched.c
> > +++ linux-2.6/kernel/time/tick-sched.c
> > @@ -383,11 +383,16 @@ void tick_nohz_dep_set_task(struct task_
> > tick_nohz_full_kick();
> > preempt_enable();
> > } else {
> > + unsigned long flags;
> > +
> > /*
> > * Some future tick_nohz_full_kick_task()
> > - * should optimize this.
> > + * should further optimize this.
> > */
> > - tick_nohz_full_kick_all();
> > + raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> > + if (cpumask_intersects(&tsk->cpus_mask, tick_nohz_full_mask))
> > + tick_nohz_full_kick_all();
> > + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags);
> > }
> > }
> > }
> >
> >
>
> Not long ago, Peterz suggested that we simply do:
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index f0199a4ba1ad..42ce8e458013 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -357,17 +357,26 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
> void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
> {
> if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) {
> - if (tsk == current) {
> - preempt_disable();
> - tick_nohz_full_kick();
> - preempt_enable();
> - } else {
> - /*
> - * Some future tick_nohz_full_kick_task()
> - * should optimize this.
> - */
> - tick_nohz_full_kick_all();
> - }
> + int cpu = task_cpu(tsk);
> +
> + /*
> + * If the task concurrently migrates to another cpu,
> + * we guarantee it sees the new tick dependency upon
> + * schedule.
> + *
> + * set_task_cpu(p, cpu);
> + * STORE p->cpu = @cpu
> + * __schedule() (switch to task 'p')
> + * LOCK rq->lock
> + * smp_mb__after_spin_lock() STORE p->tick_dep_mask
> + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or())
> + * LOAD p->tick_dep_mask LOAD p->cpu
> + */
> +
> + preempt_disable();
> + if (cpu_online(cpu))
> + tick_nohz_full_kick_cpu(cpu);
> + preempt_enable();
> }
> }
> EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);

This can also be used for the signal case... thanks.