Re: [PATCH RFC v2 tip/core/rcu 03/22] rcutorture: Add flag to produce non-busy-wait task stalls

From: Paul E. McKenney
Date: Fri Mar 20 2020 - 12:11:21 EST


On Fri, Mar 20, 2020 at 12:03:29PM +0800, Hillf Danton wrote:
>
> On Thu, 19 Mar 2020 08:22:53 -0700 "Paul E. McKenney" wrote:
> >
> > On Thu, Mar 19, 2020 at 09:39:47PM +0800, Hillf Danton wrote:
> > >
> > > On Thu, 19 Mar 2020 05:38:12 -0700 "Paul E. McKenney" wrote:
> > > >
> > > > On Thu, Mar 19, 2020 at 06:46:14PM +0800, Hillf Danton wrote:
> > > > >
> > > > > On Wed, 18 Mar 2020 17:10:41 -0700
> > > > > > static int rcu_torture_stall(void *args)
> > > > > > {
> > > > > > + int idx;
> > > > > > unsigned long stop_at;
> > > > > >
> > > > > > VERBOSE_TOROUT_STRING("rcu_torture_stall task started");
> > > > > > @@ -1610,21 +1612,22 @@ static int rcu_torture_stall(void *args)
> > > > > > if (!kthread_should_stop()) {
> > > > > > stop_at = ktime_get_seconds() + stall_cpu;
> > > > > > /* RCU CPU stall is expected behavior in following code. */
> > > > > > - rcu_read_lock();
> > > > > > + idx = cur_ops->readlock();
> > > > > > if (stall_cpu_irqsoff)
> > > > > > local_irq_disable();
> > > > > > - else
> > > > > > + else if (!stall_cpu_block)
> > > > > > preempt_disable();
> > > > > > pr_alert("rcu_torture_stall start on CPU %d.\n",
> > > > > > - smp_processor_id());
> > > > > > + raw_smp_processor_id());
> > > > > > while (ULONG_CMP_LT((unsigned long)ktime_get_seconds(),
> > > > > > stop_at))
> > > > > > - continue; /* Induce RCU CPU stall warning. */
> > > > > > + if (stall_cpu_block)
> > > > > > + schedule_timeout_uninterruptible(HZ);
> > > > >
> > > > > Why is the scheduled-in task so special that it will be running on
> > > > > the current CPU with irq disabled?
> > > >
> > > > You lost me on this one.
> > >
> > > Quite likely :)
> > >
> > > > IRQs are not at all disabled.
> > > >
> > > > > > if (stall_cpu_irqsoff)
> > > > > > local_irq_enable();
> > >
> > > Local IRQs get enabled here depending on stall_cpu_irqsoff.
> > >
> > > What I was asking is the scheduling case like
> > >
> > > local_irq_disable();
> > > schedule_timeout(HZ);
> > > local_irq_enable();
> > >
> > > Is it likely going to be ruled out in this patch?
> >
> > If an rcutorture run specified both the rcutorture.stall_cpu_irqsoff and
> > the rcutorture.stall_cpu_block module parameters, then yes, exactly the
> > sequence you call out should occur. Can't say that I have tried this,
> > though. Nor would I expect to have ever done so without your suggesting
> > that I do.
> >
> > But why not try it on current -rcu?
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 3 --configs "TRACE01" --bootargs "rcutorture.stall_cpu=25 rcutorture.stall_cpu_holdoff=30 rcutorture.stall_cpu_block=1 rcupdate.rcu_task_stall_timeout=10000 rcutorture.stall_cpu_irqsoff"
> >
> > This tells rcutorture to use all 12 hardware threads, to run the kernel for
> > three minutes, to run only the TRACE01 rcutorture scenario, and to test
> > RCU CPU stall warnings:
> >
> > rcutorture.stall_cpu=25: Stall the CPU for 25 seconds.
> >
> > rcutorture.stall_cpu_holdoff=30: Wait 30 seconds after boot to start stalling.
> >
> > rcutorture.stall_cpu_block=1: Do the schedule_timeout_uninterruptible()
> > while stalling.
> >
> > rcupdate.rcu_task_stall_timeout=10000: Set the stall-warning timeout
> > to 10,000 jiffies, or ten seconds.
> >
> > rcutorture.stall_cpu_irqsoff: This tells rcutorture to execute the
> > local_irq_disable() that you called out above.
> >
> > And this results in a couple of stall warning messages, as expected
>
> Given these warning messages,
>
> > given that you get two ten-second intervals in a 25-second interval.
>
> I suspect it is likely to induce RCU CPU stall using
> schedule_timeout_uninterruptible(HZ).

Exactly. In fact, inducing an RCU CPU stall warning is the whole purpose
of those rcutorture module parameters.

But it depends on the flavor of RCU in use. Feel free to experiment!

And again, I do not expect to be using rcutorture.stall_cpu_block=1 and
rcutorture.stall_cpu_irqsoff at the same time except as a demonstration
of something silly. So I do not see the need to make a change. If you
are advocating some change or reporting some bug, I am missing your point.

Thanx, Paul