Re: [PATCH] tick/nohz: Reduce the critical region for jiffies_seq

From: Thomas Gleixner
Date: Sun Nov 15 2020 - 14:46:36 EST


On Wed, Nov 11 2020 at 17:11, Yunfeng Ye wrote:
> When nohz or nohz_full is configured, the concurrency calls of
> tick_do_update_jiffies64 increases,

Why?

> and the conflict between jiffies_lock and jiffies_seq increases,
> especially in multi-core scenarios.

This does not make sense. The sequence counter is updated when holding
the lock, so there is no conflict between the lock and the sequence
count.

> However, it is unnecessary to update the jiffies_seq lock multiple
> times in a tick period, so the critical region of the jiffies_seq
> can be reduced to reduce latency overheads.

This does not make sense either. Before taking the lock we have:

delta = ktime_sub(now, READ_ONCE(last_jiffies_update));
if (delta < tick_period)
return;

as a lockless quick check.

We also have mechanisms to avoid that a gazillion of CPUs call this. Why
are they not working or are some of the callsites missing them?

I'm not against reducing the seqcount write scope per se, but it needs a
proper and correct explanation.

> By the way, last_jiffies_update is protected by jiffies_lock, so
> reducing the jiffies_seq critical area is safe.

This is misleading. The write to last_jiffies_update is serialized by
the jiffies lock, but the write has also to be inside the sequence write
held section because tick_nohz_next_event() does:

/* Read jiffies and the time when jiffies were updated last */
do {
seq = read_seqcount_begin(&jiffies_seq);
basemono = last_jiffies_update;
basejiff = jiffies;
} while (read_seqcount_retry(&jiffies_seq, seq));

So there is no 'By the way'.

Thanks,

tglx