Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag
From: Paul E. McKenney
Date: Thu Aug 13 2020 - 14:53:01 EST
On Thu, Aug 13, 2020 at 08:26:18PM +0200, peterz@xxxxxxxxxxxxx wrote:
> On Thu, Aug 13, 2020 at 04:34:57PM +0200, Thomas Gleixner wrote:
> > Michal Hocko <mhocko@xxxxxxxx> writes:
> > > On Thu 13-08-20 15:22:00, Thomas Gleixner wrote:
> > >> It basically requires to convert the wait queue to something else. Is
> > >> the waitqueue strict single waiter?
> > >
> > > I would have to double check. From what I remember only kswapd should
> > > ever sleep on it.
> >
> > That would make it trivial as we could simply switch it over to rcu_wait.
> >
> > >> So that should be:
> > >>
> > >> if (!preemptible() && gfp == GFP_RT_NOWAIT)
> > >>
> > >> which is limiting the damage to those callers which hand in
> > >> GFP_RT_NOWAIT.
> > >>
> > >> lockdep will yell at invocations with gfp != GFP_RT_NOWAIT when it hits
> > >> zone->lock in the wrong context. And we want to know about that so we
> > >> can look at the caller and figure out how to solve it.
> > >
> > > Yes, that would have to somehow need to annotate the zone_lock to be ok
> > > in those paths so that lockdep doesn't complain.
> >
> > That opens the worst of all cans of worms. If we start this here then
> > Joe programmer and his dog will use these lockdep annotation to evade
> > warnings and when exposed to RT it will fall apart in pieces. Just that
> > at that point Joe programmer moved on to something else and the usual
> > suspects can mop up the pieces. We've seen that all over the place and
> > some people even disable lockdep temporarily because annotations don't
> > help.
> >
> > PeterZ might have opinions about that too I suspect.
>
> PeterZ is mightily confused by all of this -- also heat induced brain
> melt.
>
> I thought the rule was:
>
> - No allocators (alloc/free) inside raw_spinlock_t, full-stop.
>
> Why are we trying to craft an exception?
So that we can reduce post-grace-period cache misses by a factor of
eight when invoking RCU callbacks. This reduction in cache misses also
makes it more difficult to overrun RCU with floods of either call_rcu()
or kfree_rcu() invocations.
The idea is to allocate page-sized arrays of pointers so that the callback
invocation can sequence through the array instead of walking a linked
list, hence the reduction in cache misses.
If the allocation fails, for example, during OOM events, we fall back to
the linked-list approach. So, as with much of the rest of the kernel,
under OOM we just run a bit slower.
Thanx, Paul