Re: Finally starting on short RCU grace periods, but...
From: Paul E. McKenney
Date: Thu Aug 06 2020 - 13:34:54 EST
On Thu, Aug 06, 2020 at 03:25:57PM +0200, Dmitry Vyukov wrote:
> On Thu, Aug 6, 2020 at 3:22 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > On Thu, Aug 6, 2020 at 12:31 PM Marco Elver <elver@xxxxxxxxxx> wrote:
> > >
> > > +Cc kasan-dev
Thank you!
> > > On Thu, 6 Aug 2020 at 01:08, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > >
> > > > Hello!
> > > >
> > > > If I remember correctly, one of you asked for a way to shorten RCU
> > > > grace periods so that KASAN would have a better chance of detecting bugs
> > > > such as pointers being leaked out of RCU read-side critical sections.
> > > > I am finally starting entering and testing code for this, but realized
> > > > that I had forgotten a couple of things:
> > > >
> > > > 1. I don't remember exactly who asked, but I suspect that it was
> > > > Kostya. I am using his Reported-by as a placeholder for the
> > > > moment, but please let me know if this should be adjusted.
> > >
> > > It certainly was not me.
> > >
> > > > 2. Although this work is necessary to detect situtions where
> > > > call_rcu() is used to initiate a grace period, there already
> > > > exists a way to make short grace periods that are initiated by
> > > > synchronize_rcu(), namely, the rcupdate.rcu_expedited kernel
> > > > boot parameter. This will cause all calls to synchronize_rcu()
> > > > to act like synchronize_rcu_expedited(), resulting in about 2-3
> > > > orders of magnitude reduction in grace-period latency on small
> > > > systems (say 16 CPUs).
> > > >
> > > > In addition, I plan to make a few other adjustments that will
> > > > increase the probability of KASAN spotting a pointer leak even in the
> > > > rcupdate.rcu_expedited case.
> > >
> > > Thank you, that'll be useful I think.
> > >
> > > > But if you would like to start this sort of testing on current mainline,
> > > > rcupdate.rcu_expedited is your friend!
> >
> > Hi Paul,
> >
> > This is great!
> >
> > I understand it's not a sufficiently challenging way of tracking
> > things, but it's simply here ;)
> > https://bugzilla.kernel.org/show_bug.cgi?id=208299
> > (now we also know who asked for this, +Jann)
Thank you, and I will update the Reported-by lines accordingly.
> > I've tested on the latest mainline and with rcupdate.rcu_expedited=1
> > it boots to ssh successfully and I see:
> > [ 0.369258][ T0] All grace periods are expedited (rcu_expedited).
> >
> > I have created https://github.com/google/syzkaller/pull/2021 to enable
> > it on syzbot.
> > On syzbot we generally use only 2-4 CPUs per VM, so it should be even better.
Sounds good, and perhaps this will answer Marco's question below. ;-)
> > > Do any of you remember some bugs we missed due to this? Can we find
> > > them if we add this option?
> >
> > The problem is that it's hard to remember bugs that were not caught :)
> > Here is an approximation of UAFs with free in rcu callback:
> > https://groups.google.com/forum/#!searchin/syzkaller-bugs/KASAN$20use-after-free$20rcu_do_batch%7Csort:date
> > The ones with low hit count are the ones that we almost did not catch.
> > That's the best estimation I can think of. Also potentially we can get
> > reproducers for such bugs without reproducers.
> > Maybe we will be able to correlate some bugs/reproducers that appear
> > soon with this change.
>
> Wait, it was added in 2012?
> https://github.com/torvalds/linux/commit/3705b88db0d7cc4
Indeed it was, which is my current excuse for having failed to immediately
mention it to Jann during our IRC discussion.
The purpose back then was to make battery-powered systems go faster,
I think mostly focused on CPU hotplug operations. At least that would
explain the commit log being indefinite on the exact benefit. ;-)
Thanx, Paul