Re: [rcu_sched stall] regression/miss-config ?

From: Paul E. McKenney
Date: Tue May 17 2016 - 15:15:35 EST


On Tue, May 17, 2016 at 06:46:22AM -0700, santosh.shilimkar@xxxxxxxxxx wrote:
> On 5/16/16 5:58 PM, Paul E. McKenney wrote:
> >On Mon, May 16, 2016 at 12:49:41PM -0700, Santosh Shilimkar wrote:
> >>On 5/16/2016 10:34 AM, Paul E. McKenney wrote:
> >>>On Mon, May 16, 2016 at 09:33:57AM -0700, Santosh Shilimkar wrote:
>
> [...]
>
> >>>Are you running CONFIG_NO_HZ_FULL=y? If so, the problem might be that
> >>>you need more housekeeping CPUs than you currently have configured.
> >>>
> >>Yes, CONFIG_NO_HZ_FULL=y. Do you mean "CONFIG_NO_HZ_FULL_ALL=y" for
> >>book keeping. Seems like without that clock-event code will just use
> >>CPU0 for things like broadcasting which might become bottleneck.
> >>This could explain connect the hrtimer_interrupt() path getting slowed
> >>down because of book keeping bottleneck.
> >>
> >>$cat .config | grep NO_HZ
> >>CONFIG_NO_HZ_COMMON=y
> >># CONFIG_NO_HZ_IDLE is not set
> >>CONFIG_NO_HZ_FULL=y
> >># CONFIG_NO_HZ_FULL_ALL is not set
> >># CONFIG_NO_HZ_FULL_SYSIDLE is not set
> >>CONFIG_NO_HZ=y
> >># CONFIG_RCU_FAST_NO_HZ is not set
> >
> >Yes, CONFIG_NO_HZ_FULL_ALL=y would give you only one CPU for all
> >housekeeping tasks, including the RCU grace-period kthreads. So you are
> >booting without any nohz_full boot parameter? You can end up with the
> >same problem with CONFIG_NO_HZ_FULL=y and the nohz_full boot parameter
> >that you can with CONFIG_NO_HZ_FULL_ALL=y.
> >
> I see. Yes, the systems are booting without nohz_full boot parameter.
> Will try to add more CPUs to it & update the thread
> after the verification since it takes time to reproduce the issue.
>
> Thanks for discussion so far Paul. Its very insightful for me.

Please let me know how things go with further testing, especially with
the priority setting.

Thanx, Paul