Re: Finally starting on short RCU grace periods, but...

From: Dmitry Vyukov
Date: Thu Aug 06 2020 - 13:15:11 EST

On Thu, Aug 6, 2020 at 12:31 PM Marco Elver <elver@xxxxxxxxxx> wrote:
> +Cc kasan-dev
> On Thu, 6 Aug 2020 at 01:08, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > Hello!
> >
> > If I remember correctly, one of you asked for a way to shorten RCU
> > grace periods so that KASAN would have a better chance of detecting bugs
> > such as pointers being leaked out of RCU read-side critical sections.
> > I am finally starting entering and testing code for this, but realized
> > that I had forgotten a couple of things:
> >
> > 1. I don't remember exactly who asked, but I suspect that it was
> > Kostya. I am using his Reported-by as a placeholder for the
> > moment, but please let me know if this should be adjusted.
> It certainly was not me.
> > 2. Although this work is necessary to detect situtions where
> > call_rcu() is used to initiate a grace period, there already
> > exists a way to make short grace periods that are initiated by
> > synchronize_rcu(), namely, the rcupdate.rcu_expedited kernel
> > boot parameter. This will cause all calls to synchronize_rcu()
> > to act like synchronize_rcu_expedited(), resulting in about 2-3
> > orders of magnitude reduction in grace-period latency on small
> > systems (say 16 CPUs).
> >
> > In addition, I plan to make a few other adjustments that will
> > increase the probability of KASAN spotting a pointer leak even in the
> > rcupdate.rcu_expedited case.
> Thank you, that'll be useful I think.
> > But if you would like to start this sort of testing on current mainline,
> > rcupdate.rcu_expedited is your friend!

Hi Paul,

This is great!

I understand it's not a sufficiently challenging way of tracking
things, but it's simply here ;)
(now we also know who asked for this, +Jann)

I've tested on the latest mainline and with rcupdate.rcu_expedited=1
it boots to ssh successfully and I see:
[ 0.369258][ T0] All grace periods are expedited (rcu_expedited).

I have created to enable
it on syzbot.
On syzbot we generally use only 2-4 CPUs per VM, so it should be even better.

> Do any of you remember some bugs we missed due to this? Can we find
> them if we add this option?

The problem is that it's hard to remember bugs that were not caught :)
Here is an approximation of UAFs with free in rcu callback:!searchin/syzkaller-bugs/KASAN$20use-after-free$20rcu_do_batch%7Csort:date
The ones with low hit count are the ones that we almost did not catch.
That's the best estimation I can think of. Also potentially we can get
reproducers for such bugs without reproducers.
Maybe we will be able to correlate some bugs/reproducers that appear
soon with this change.