Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes

From: John Stultz
Date: Wed Jul 13 2016 - 18:25:43 EST


On Wed, Jul 13, 2016 at 2:42 PM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Jul 13, 2016 at 02:18:41PM -0700, Paul E. McKenney wrote:
>> On Wed, Jul 13, 2016 at 05:05:26PM -0400, Tejun Heo wrote:
>> > On Wed, Jul 13, 2016 at 02:03:15PM -0700, Paul E. McKenney wrote:
>> > > Take the patch that I just sent out and make the choice of normal
>> > > vs. expedited depend on CONFIG_PREEMPT_RT or whatever the -rt guys are
>> > > calling it these days. Is there a low-latency Kconfig option other
>> > > than CONFIG_NO_HZ_FULL?
>> >
>> > Sounds like a plan to me.
>>
>> I like the way we like each other's idea. Mutually assured laziness? ;-)
>
> But here is what mine might look like. Untested, probably does
> not even build. Note that the default is -no- expediting, use the
> rcusync.expedited kernel parameter to enable it.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 82b42c958d1c..b8bc9854e548 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3229,6 +3229,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> energy efficiency by requiring that the kthreads
> periodically wake up to do the polling.
>
> + rcusync.expedited [KNL]
> + Specify that the rcusync mechanism use expedited
> + grace periods. As of mid-2016, this affects
> + per-CPU rwsems.
> +
> rcutree.blimit= [KNL]
> Set maximum number of finished RCU callbacks to
> process in one batch.
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index be922c9f3d37..5bc5bef2e00a 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -22,6 +22,14 @@
>
> #include <linux/rcu_sync.h>
> #include <linux/sched.h>
> +#include <linux/moduleparam.h>
> +#include <linux/module.h>
> +
> +MODULE_ALIAS("rcusync");
> +#ifdef MODULE_PARAM_PREFIX
> +#undef MODULE_PARAM_PREFIX
> +#endif
> +#define MODULE_PARAM_PREFIX "rcusync."
>
> #ifdef CONFIG_PROVE_RCU
> #define __INIT_HELD(func) .held = func,
> @@ -29,7 +37,7 @@
> #define __INIT_HELD(func)
> #endif
>
> -static const struct {
> +static struct {
> void (*sync)(void);
> void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> void (*wait)(void);
> @@ -62,6 +70,20 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
>
> #define rss_lock gp_wait.lock
>
> +static bool expedited;
> +module_param(expedited, bool, 0444);
> +
> +static int __init rcu_sync_early_init(void)
> +{
> + if (expedited) {
> + gp_ops[RCU_SYNC].sync = synchronize_rcu_expedited;
> + gp_ops[RCU_SCHED_SYNC].sync = synchronize_sched_expedited;
> + gp_ops[RCU_BH_SYNC].sync = synchronize_rcu_bh_expedited;
> + }

So one minor nit here, with the config based default, you might want
to put some sort of informative message specifying that expidited was
used. This would help narrow down if it was or wasn't enabled when
folks see problems, since it wouldn't be otherwise obvious from a
dmesg log.

thanks
-john