Re: [rcu_sched stall] regression/miss-config ?

From: Santosh Shilimkar
Date: Thu May 19 2016 - 19:45:45 EST

Next message: Dave Chinner: "Re: sharing page cache pages between multiple mappings"
Previous message: Stephen Rothwell: "linux-next: manual merge of the vfs tree with the cifs tree"
In reply to: Paul E. McKenney: "Re: [rcu_sched stall] regression/miss-config ?"
Next in thread: Paul E. McKenney: "Re: [rcu_sched stall] regression/miss-config ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Paul,

On 5/17/2016 12:15 PM, Paul E. McKenney wrote:

On Tue, May 17, 2016 at 06:46:22AM -0700, santosh.shilimkar@xxxxxxxxxx wrote:

On 5/16/16 5:58 PM, Paul E. McKenney wrote:

On Mon, May 16, 2016 at 12:49:41PM -0700, Santosh Shilimkar wrote:

On 5/16/2016 10:34 AM, Paul E. McKenney wrote:

On Mon, May 16, 2016 at 09:33:57AM -0700, Santosh Shilimkar wrote:

[...]

Are you running CONFIG_NO_HZ_FULL=y? If so, the problem might be that
you need more housekeeping CPUs than you currently have configured.

Yes, CONFIG_NO_HZ_FULL=y. Do you mean "CONFIG_NO_HZ_FULL_ALL=y" for
book keeping. Seems like without that clock-event code will just use
CPU0 for things like broadcasting which might become bottleneck.
This could explain connect the hrtimer_interrupt() path getting slowed
down because of book keeping bottleneck.

$cat .config | grep NO_HZ
CONFIG_NO_HZ_COMMON=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
# CONFIG_NO_HZ_FULL_ALL is not set
# CONFIG_NO_HZ_FULL_SYSIDLE is not set
CONFIG_NO_HZ=y
# CONFIG_RCU_FAST_NO_HZ is not set

Yes, CONFIG_NO_HZ_FULL_ALL=y would give you only one CPU for all
housekeeping tasks, including the RCU grace-period kthreads. So you are
booting without any nohz_full boot parameter? You can end up with the
same problem with CONFIG_NO_HZ_FULL=y and the nohz_full boot parameter
that you can with CONFIG_NO_HZ_FULL_ALL=y.

I see. Yes, the systems are booting without nohz_full boot parameter.
Will try to add more CPUs to it & update the thread
after the verification since it takes time to reproduce the issue.

Thanks for discussion so far Paul. Its very insightful for me.

Please let me know how things go with further testing, especially with
the priority setting.

Sorry for delay. I manage to get information about XEN usecase
custom config as discussed above. To reduce variables, I disabled "CONFIG_NO_HZ_FULL" altogether. So the effective setting was:

CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_KTHREAD_PRIO=1
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_TRACE=y

Unfortunately the XEN test still failed. Log end of
the email. This test(s) is bit peculiar though since
its database running in VM with 1 or 2 CPUs. One of
the suspect is because the database RT processes are
hogging the CPU(s), kernel RCU thread is not getting chance
to run which eventually results in stall. Does it
make sense ?

Please note that its non-preempt kernel using RT processes. ;-)

# cat .config | grep PREEMPT
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

Regards,
Santosh
...
....
rcu_sched kthread starved for 399032 jiffies!
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=462037 jiffies, g=118888, c=118887, q=0)
All QSes seen, last rcu_sched kthread activity 462037 (4296277632-4295815595), jiffies_till_next_fqs=3, root ->qsmask 0x0
ocssd.bin R running task 0 15375 1 0x00000000
0000000000000000 ffff8800ec003bc8 ffffffff810a8581 ffffffff81abf980
000000000001d068 ffff8800ec003c28 ffffffff810e9c98 0000000000000000
0000000000000086 0000000000000000 0000000000000086 0000000000000082
Call Trace:
<IRQ> [<ffffffff810a8581>] sched_show_task+0xb1/0x120
[<ffffffff810e9c98>] print_other_cpu_stall+0x288/0x2d0
[<ffffffff810e9e60>] __rcu_pending+0x180/0x230
[<ffffffff810e9fa5>] rcu_check_callbacks+0x95/0x140
[<ffffffff810edf92>] update_process_times+0x42/0x70
[<ffffffff810ffc79>] tick_sched_handle+0x39/0x80
[<ffffffff810ffee2>] tick_sched_timer+0x52/0xa0
[<ffffffff810f0424>] __run_hrtimer+0x74/0x1d0
[<ffffffff810ffe90>] ? tick_nohz_handler+0xc0/0xc0
[<ffffffff810f07b2>] hrtimer_interrupt+0x102/0x240
[<ffffffff8100aaae>] xen_timer_interrupt+0x2e/0x130
[<ffffffff8143347a>] ? add_interrupt_randomness+0x3a/0x1f0
[<ffffffff81384290>] ? store_cursor_blink+0xc0/0xc0
[<ffffffff810d9de4>] handle_irq_event_percpu+0x54/0x1b0
[<ffffffff810dd477>] handle_percpu_irq+0x47/0x70
[<ffffffff810d9d27>] generic_handle_irq+0x27/0x40
[<ffffffff813e53ca>] evtchn_2l_handle_events+0x25a/0x260
[<ffffffff81085541>] ? __do_softirq+0x191/0x2f0
[<ffffffff813e2c2f>] __xen_evtchn_do_upcall+0x4f/0x90
[<ffffffff813e3404>] xen_evtchn_do_upcall+0x34/0x50
[<ffffffff816c761e>] xen_hvm_callback_vector+0x6e/0x80
<EOI>
rcu_sched kthread starved for 462037 jiffies!

Next message: Dave Chinner: "Re: sharing page cache pages between multiple mappings"
Previous message: Stephen Rothwell: "linux-next: manual merge of the vfs tree with the cifs tree"
In reply to: Paul E. McKenney: "Re: [rcu_sched stall] regression/miss-config ?"
Next in thread: Paul E. McKenney: "Re: [rcu_sched stall] regression/miss-config ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]