Re: rcu_preempt caused oom

From: Paul E. McKenney
Date: Wed Dec 12 2018 - 21:42:45 EST


On Thu, Dec 13, 2018 at 02:11:35AM +0000, Zhang, Jun wrote:
> Hello, Paul
>
> I think the next patch is better.
> Because ULONG_CMP_GE could cause double write, which has risk that write back old value.
> Please help review.
> I don't test it. If you agree, we will test it.

Just to make sure that I understand, you are worried about something like
the following, correct?

o __note_gp_changes() compares rnp->gp_seq_needed and rdp->gp_seq_needed
and finds them equal.

o At just this time something like rcu_start_this_gp() assigns a new
(larger) value to rdp->gp_seq_needed.

o Then __note_gp_changes() overwrites rdp->gp_seq_needed with the
old value.

This cannot happen because __note_gp_changes() runs with interrupts
disabled on the CPU corresponding to the rcu_data structure referenced
by the rdp pointer. So there is no way for rcu_start_this_gp() to be
invoked on the same CPU during this "if" statement.

Of course, there could be bugs. For example:

o __note_gp_changes() might be called on a different CPU than that
corresponding to rdp. You can check this with something like:

WARN_ON_ONCE(rdp->cpu != smp_processor_id());

o The same things could happen with rcu_start_this_gp(), and the
above WARN_ON_ONCE() would work there as well.

o rcutree_prepare_cpu() is a special case, but is irrelevant unless
you are doing CPU-hotplug operations. (It can run on a CPU other
than rdp->cpu, but only at times when rdp->cpu is offline.)

o Interrupts might not really be disabled.

That said, your patch could reduce overhead slightly, given that the
two values will be equal much of the time. So it might be worth testing
just for that reason.

So why not just test it anyway? If it makes the bug go away, I will be
surprised, but it would not be the first surprise for me. ;-)

Thanx, Paul

> Thanks!
>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0b760c1..c00f34e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1849,7 +1849,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
> zero_cpu_stall_ticks(rdp);
> }
> rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
> - if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || rdp->gpwrap)
> + if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
> rdp->gp_seq_needed = rnp->gp_seq_needed;
> WRITE_ONCE(rdp->gpwrap, false);
> rcu_gpnum_ovf(rnp, rdp);
>
>
> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck@xxxxxxxxxxxxx]
> Sent: Thursday, December 13, 2018 08:12
> To: He, Bo <bo.he@xxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; josh@xxxxxxxxxxxxxxxx; mathieu.desnoyers@xxxxxxxxxxxx; jiangshanlai@xxxxxxxxx; Zhang, Jun <jun.zhang@xxxxxxxxx>; Xiao, Jin <jin.xiao@xxxxxxxxx>; Zhang, Yanmin <yanmin.zhang@xxxxxxxxx>; Bai, Jie A <jie.a.bai@xxxxxxxxx>; Sun, Yi J <yi.j.sun@xxxxxxxxx>
> Subject: Re: rcu_preempt caused oom
>
> On Wed, Dec 12, 2018 at 11:13:22PM +0000, He, Bo wrote:
> > I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also checked the latest kernel and the latest tag v4.20-rc6, not see the sysrq_rcu.
> > Please correct me if I have something wrong.
>
> That would be because I sent you the wrong patch, apologies! :-/
>
> Please instead see the one below, which does add sysrq_rcu.
>
> Thanx, Paul
>
> > -----Original Message-----
> > From: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> > Sent: Thursday, December 13, 2018 5:03 AM
> > To: He, Bo <bo.he@xxxxxxxxx>
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>;
> > linux-kernel@xxxxxxxxxxxxxxx; josh@xxxxxxxxxxxxxxxx;
> > mathieu.desnoyers@xxxxxxxxxxxx; jiangshanlai@xxxxxxxxx; Zhang, Jun
> > <jun.zhang@xxxxxxxxx>; Xiao, Jin <jin.xiao@xxxxxxxxx>; Zhang, Yanmin
> > <yanmin.zhang@xxxxxxxxx>; Bai, Jie A <jie.a.bai@xxxxxxxxx>
> > Subject: Re: rcu_preempt caused oom
> >
> > On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> > > On Wed, Dec 12, 2018 at 01:21:33PM +0000, He, Bo wrote:
> > > > we reproduce on two boards, but I still not see the show_rcu_gp_kthreads() dump logs, it seems the patch can't catch the scenario.
> > > > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as it's extracted from the /proc/config.gz.
> > >
> > > Strange.
> > >
> > > Are the systems responsive to sysrq keys once failure occurs? If
> > > so, I will provide you a sysrq-R or some such to dump out the RCU state.
> >
> > Or, as it turns out, sysrq-y if booting with rcutree.sysrq_rcu=1 using the patch below. Only lightly tested.
>
> ------------------------------------------------------------------------
>
> commit 04b6245c8458e8725f4169e62912c1fadfdf8141
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> Date: Wed Dec 12 16:10:09 2018 -0800
>
> rcu: Add sysrq rcu_node-dump capability
>
> Backported from v4.21/v5.0
>
> Life is hard if RCU manages to get stuck without triggering RCU CPU
> stall warnings or triggering the rcu_check_gp_start_stall() checks
> for failing to start a grace period. This commit therefore adds a
> boot-time-selectable sysrq key (commandeering "y") that allows manually
> dumping Tree RCU state. The new rcutree.sysrq_rcu kernel boot parameter
> must be set for this sysrq to be available.
>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0b760c1369f7..e9392a9d6291 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -61,6 +61,7 @@
> #include <linux/trace_events.h>
> #include <linux/suspend.h>
> #include <linux/ftrace.h>
> +#include <linux/sysrq.h>
>
> #include "tree.h"
> #include "rcu.h"
> @@ -128,6 +129,9 @@ int num_rcu_lvl[] = NUM_RCU_LVL_INIT; int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
> /* panic() on RCU Stall sysctl. */
> int sysctl_panic_on_rcu_stall __read_mostly;
> +/* Commandeer a sysrq key to dump RCU's tree. */ static bool sysrq_rcu;
> +module_param(sysrq_rcu, bool, 0444);
>
> /*
> * The rcu_scheduler_active variable is initialized to the value @@ -662,6 +666,27 @@ void show_rcu_gp_kthreads(void) } EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
>
> +/* Dump grace-period-request information due to commandeered sysrq. */
> +static void sysrq_show_rcu(int key) {
> + show_rcu_gp_kthreads();
> +}
> +
> +static struct sysrq_key_op sysrq_rcudump_op = {
> + .handler = sysrq_show_rcu,
> + .help_msg = "show-rcu(y)",
> + .action_msg = "Show RCU tree",
> + .enable_mask = SYSRQ_ENABLE_DUMP,
> +};
> +
> +static int __init rcu_sysrq_init(void)
> +{
> + if (sysrq_rcu)
> + return register_sysrq_key('y', &sysrq_rcudump_op);
> + return 0;
> +}
> +early_initcall(rcu_sysrq_init);
> +
> /*
> * Send along grace-period-related data for rcutorture diagnostics.
> */
>