Re: weird loadavg on idle machine post 5.7
From: Qais Yousef
Date: Tue Jul 07 2020 - 05:20:12 EST
On 07/06/20 16:59, Peter Zijlstra wrote:
[...]
> @@ -4104,12 +4108,19 @@ static void __sched notrace __schedule(bool preempt)
> local_irq_disable();
> rcu_note_context_switch(preempt);
>
> + prev_state = prev->state;
> +
> /*
> - * Make sure that signal_pending_state()->signal_pending() below
> - * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> - * done by the caller to avoid the race with signal_wake_up().
> + * __set_current_state(@state)
> + * schedule() signal_wake_up()
> + * prev_state = p->state set_tsk_thread_flag(p, TIF_SIGPENDING)
> + * wake_up_state()
> + * LOCK rq->lock LOCK p->pi_state
> + * smp_mb__after_spinlock() smp_mb__after_spinlock()
> + * if (signal_pending_state() if (p->state & @state)
> + *
> *
> - * The membarrier system call requires a full memory barrier
> + * Also, the membarrier system call requires a full memory barrier
> * after coming from user-space, before storing to rq->curr.
> */
> rq_lock(rq, &rf);
> @@ -4120,10 +4131,30 @@ static void __sched notrace __schedule(bool preempt)
> update_rq_clock(rq);
>
> switch_count = &prev->nivcsw;
> - if (!preempt && prev->state) {
> - if (signal_pending_state(prev->state, prev)) {
> + /*
> + * We must re-load p->state in case ttwu_runnable() changed it
> + * before we acquired rq->lock.
> + */
> + if (!preempt && prev_state && prev_state == prev->state) {
I think the compiler won't optimize `prev_state == prev->state` out because of
the smp_mb__after_spinlock() which implies a compiler barrier. Still not sure
if it's worth making prev->state accesses a READ_ONCE()?
Thanks
--
Qais Yousef