Re: [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

From: Peter Zijlstra
Date: Mon Jul 13 2020 - 09:36:22 EST


On Mon, Jul 13, 2020 at 01:12:46PM +0100, Qais Yousef wrote:
> On 07/13/20 13:21, Peter Zijlstra wrote:

> > It's monday, and I cannot get my brain working.. I cannot decipher the
> > comments you have with the smp_[rw]mb(), what actual ordering do they
> > enforce?
>
> It was a bit of a paranoia to ensure that readers on other cpus see the new
> value after this point.

IIUC that's not something any barrier can provide.

Barriers can only order between (at least) two memory operations:

X = 1; y = Y;
smp_wmb(); smp_rmb();
Y = 1; x = X;

guarantees that if y == 1, then x must also be 1. Because the left hand
side orders the store of Y after the store of X, while the right hand
side order the load of X after the load of Y. Therefore, if the first
load observes the last store, the second load must observe the first
store.

Without a second variable, barriers can't guarantee _anything_. Which is
why any barrier comment should refer to at least two variables.

> > Also, your synchronize_rcu() relies on write_lock() beeing
> > non-preemptible, which isn't true on PREEMPT_RT.
> >
> > The below seems simpler...

> Hmm maybe I am missing something obvious, but beside the race with fork; I was
> worried about another race and that's what the synchronize_rcu() is trying to
> handle.
>
> It's the classic preemption in the middle of RMW operation race.
>
> copy_process() sysctl_uclamp
>
> sched_post_fork()
> __uclamp_sync_rt()
> // read sysctl
> // PREEMPT
> for_each_process_thread()
> // RESUME
> // write syctl to p
>

> 2. sysctl_uclamp happens *during* sched_post_fork()
>
> There's the risk of the classic preemption in the middle of RMW where another
> CPU could have changed the shared variable after the current CPU has already
> read it, but before writing it back.

Aah.. I see.

> I protect this with rcu_read_lock() which as far as I know synchronize_rcu()
> will ensure if we do the update during this section; we'll wait for it to
> finish. New forkees entering the rcu_read_lock() section will be okay because
> they should see the new value.
>
> spinlocks() and mutexes seemed inferior to this approach.

Well, didn't we just write in another patch that p->uclamp_* was
protected by both rq->lock and p->pi_lock?