Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

From: Paul E. McKenney
Date: Thu Mar 10 2022 - 10:09:07 EST


On Thu, Mar 10, 2022 at 09:45:17AM +0100, Peter Zijlstra wrote:
> On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > > we must already have rq lock held, then the RCU read lock can
> > > > > be optimized away.
> > > > >
> > > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > > in kernel/sched/sched.h.
> > > > >
> > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > > > > Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
> > > >
> > > > This patch landed recently in linux-next as commit dc6e0818bc9a
> > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > > > found that it triggers a following warning in the early boot stage:
> > > >
> > > > Calibrating delay loop (skipped), value calculated using timer
> > > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > > pid_max: default: 32768 minimum: 301
> > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > CPU: Testing write buffer coherency: ok
> > > > CPU0: Spectre v2: using BPIALL workaround
> > > >
> > > > =============================
> > > > WARNING: suspicious RCU usage
> > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > > -----------------------------
> > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > >
> > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > suppose.
> > >
> > > Paul?
> >
> > That would reduce the number of warnings, but it also would hide bugs.
> >
> > So, are you sure you really want this?
>
> I don't understand... Since the flavours got merged regular RCU has it's
> quescent state held off by preempt_disable. So how can relying on that
> cause bugs?

Somene forgets an rcu_read_lock() and there happens to be something
like a preempt_disable() that by coincidence covers that particular
rcu_dereference(). The kernel therefore doesn't complain. That someone
goes on to other things, maybe even posthumously. Then some time later
the preempt_disable() goes away, for good and sufficient reasons.

Good luck figuring out where to put the needed rcu_read_lock() and
rcu_read_unlock().

> And if we can rely on that, then surely rcu_dereferenced_check() ought
> to play by the same rules, otherwise we get silly warnings like these at
> hand.
>
> Specifically, we removed the rcu_read_lock() here because this has
> rq->lock held, which is a raw_spinlock_t which very much implies preempt
> disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> be disabled.
>
> There is no possible way for RCU to make progress.

Then let's have that particular rcu_dereference_check() explicitly state
what it needs, which seems to be either rcu_read_lock() on the one hand.
Right now, that could be just this:

p = rcu_dereference_check(gp, rcu_read_lock_sched_held());

Or am I missing something here?

Thanx, Paul