Re: [PATCH v4 00/14] Implement call_rcu_lazy() and miscellaneous fixes

From: Frederic Weisbecker
Date: Mon Aug 29 2022 - 09:41:04 EST


On Fri, Aug 19, 2022 at 08:48:43PM +0000, Joel Fernandes (Google) wrote:
> Refresh tested on real ChromeOS userspace and hardware, passes boot time tests
> and rcuscale tests.
>
> Fixes on top of v3:
> - Fix boot issues due to a race in the lazy RCU logic which caused a missed
> wakeup of the RCU GP thread, causing synchronize_rcu() to stall.
> - Fixed trace_rcu_callback tracepoint
>
> I tested power previously [1], I am in the process of testing power again but I
> wanted share my latest code as others who are testing power as well could use
> the above fixes.

Your patch is very likely to be _generally_ useful and therefore,
the more I look into this, the more I wonder if it is a good idea to rely on
bypass at all, let alone NOCB. Of course in the long term the goal is to have
bypass working without NOCB but why even bothering implementing it for nocb
in the first place?

Several highlights:

1) NOCB is most often needed for nohz_full and the latter has terrible power
management. The CPU 0 is active all the time there.

2) NOCB without nohz_full has extremely rare usecase (RT niche:
https://lore.kernel.org/lkml/CAFzL-7vqTX-y06Kc3HaLqRWAYE0d=ms3TzVtZLn0c6ATrKD+Qw@xxxxxxxxxxxxxx/
)

2) NOCB implies performance issues.

3) We are mixing up two very different things in a single list of callbacks:
lazy callbacks and flooding callbacks, as a result we are adding lots of
off-topic corner cases all around:
* a seperate lazy len field to struct rcu_cblist whose purpose is much more
general than just bypass/lazy
* "lazy" specialized parameters to general purpose cblist management
functions

4) This is further complexifying bypass core code, nocb timer management, core
nocb group management, all of which being already very complicated.

5) The !NOCB implementation is going to be very different

Ok I can admit one counter argument in favour of using NO_CB:

-1) The scheduler can benefit from a wake CPU to run the callbacks on behalf of a bunch
of idle CPUs, instead of waking up that bunch of CPUs. But still we are dealing
with callbacks that can actually wait...


So here is a proposal: how about forgetting NOCB for now and instead add a new
RCU_LAZY_TAIL segment in the struct rcu_segcblist right after RCU_NEXT_TAIL?
Then ignore that segment until some timer expiry has been met or the CPU is
known to be busy? Probably some tiny bits need to be tweaked in segcblist
management functions but probably not that much. And also make sure that entrain()
queues to RCU_LAZY_TAIL.

Then the only difference in the case of NOCB is that we add a new timer to the
nocb group leader instead of a local timer in !NOCB.

Now of course I'm certainly overlooking obvious things as always :)

Thanks.