Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu

From: Qian Cai
Date: Wed Mar 04 2020 - 06:21:01 EST




> On Mar 4, 2020, at 4:39 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> They are reported, but are they actually a real problem?
>
> This completely lacks analysis why these 8 places need the
> READ/WRITE_ONCE() treatment at all and if so why the other 14 places
> accessing tick_do_timer_cpu are safe without it.
>
> I definitely appreciate the work done with KCSAN, but just making the
> tool shut up does not cut it.

Looks at tick_sched_do_timer(), for example,

if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
#ifdef CONFIG_NO_HZ_FULL
WARN_ON(tick_nohz_full_running);
#endif
tick_do_timer_cpu = cpu;
}
#endif

/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);

Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to,

if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);

So it could save a branch or/and realized that tick_do_timer_cpu is not used later in this cpu, so it could discard the store?

I am not all that familiar with all other 14 places if it is possible to happen concurrently as well, so I took a pragmatic approach that for now only deal with the places that KCSAN confirmed, and then look forward for an incremental approach if there are more places needs treatments later once confirmed.