On Sun, May 15, 2016 at 09:35:40PM -0700, santosh.shilimkar@xxxxxxxxxx wrote:
On 5/15/16 2:18 PM, Santosh Shilimkar wrote:
Hi Paul,One of my colleague told me the pastebin server I used
I was asking Sasha about [1] since other folks in Oracle
also stumbled upon similar RCU stalls with v4.1 kernel in
different workloads. I was reported similar issue with
RDS as well and looking at [1], [2], [3] and [4], thought
of reaching out to see if you can help us to understand
this issue better.
Have also included RCU specific config used in these
test(s). Its very hard to reproduce the issue but one of
the data point is, it reproduces on systems with larger
CPUs(64+). Same workload with less than 64 CPUs, don't
show the issue. Someone also told me, making use of
SLAB instead SLUB allocator makes difference but I
haven't verified that part for RDS.
Let me know your thoughts. Thanks in advance !!
is Oracle internal only so adding the relevant logs along
with email.
First of all thanks for explanation.[1] https://lkml.org/lkml/2014/12/14/304
[2] Log 1 snippet:
-----------------------------------------------------------------
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched self-detected stall on CPU { 54} (t=60000 jiffies
g=66023 c=66022 q=0)
Task dump for CPU 54:
ksoftirqd/54 R running task 0 389 2 0x00000008
0000000000000007 ffff88ff7f403d38 ffffffff810a8621 0000000000000036
ffffffff81ab6540 ffff88ff7f403d58 ffffffff810a86cf 0000000000000086
ffffffff81ab6940 ffff88ff7f403d88 ffffffff810e3ad3 ffffffff81ab6540
Call Trace:
<IRQ> [<ffffffff810a8621>] sched_show_task+0xb1/0x120
[<ffffffff810a86cf>] dump_cpu_task+0x3f/0x50
[<ffffffff810e3ad3>] rcu_dump_cpu_stacks+0x83/0xc0
[<ffffffff810e490c>] print_cpu_stall+0xfc/0x170
[<ffffffff810e5eeb>] __rcu_pending+0x2bb/0x2c0
[<ffffffff810e5f8d>] rcu_check_callbacks+0x9d/0x170
[<ffffffff810e9772>] update_process_times+0x42/0x70
[<ffffffff810fb589>] tick_sched_handle+0x39/0x80
[<ffffffff810fb824>] tick_sched_timer+0x44/0x80
[<ffffffff810ebc04>] __run_hrtimer+0x74/0x1d0
[<ffffffff810fb7e0>] ? tick_nohz_handler+0xa0/0xa0
[<ffffffff810ebf92>] hrtimer_interrupt+0x102/0x240
[<ffffffff810521f9>] local_apic_timer_interrupt+0x39/0x60
[<ffffffff816c47b5>] smp_apic_timer_interrupt+0x45/0x59
[<ffffffff816c263e>] apic_timer_interrupt+0x6e/0x80
<EOI> [<ffffffff8118db64>] ? free_one_page+0x164/0x380
[<ffffffff8118de43>] ? __free_pages_ok+0xc3/0xe0
[<ffffffff8118e775>] __free_pages+0x25/0x40
[<ffffffffa21054f0>] rds_message_purge+0x60/0x150 [rds]
[<ffffffffa2105624>] rds_message_put+0x44/0x80 [rds]
[<ffffffffa21535b4>] rds_ib_send_cqe_handler+0x134/0x2d0 [rds_rdma]
[<ffffffff816c102b>] ? _raw_spin_unlock_irqrestore+0x1b/0x50
[<ffffffffa18c0273>] ? mlx4_ib_poll_cq+0xb3/0x2a0 [mlx4_ib]
[<ffffffffa214c6f1>] poll_cq+0xa1/0xe0 [rds_rdma]
[<ffffffffa214d489>] rds_ib_tasklet_fn_send+0x79/0xf0 [rds_rdma]
The most likely possibility is that there is a 60-second-long loop in
one of the above functions. This is within bottom-half execution, so
unfortunately the usual trick of placing cond_resched_rcu_qs() within this
loop, but outside of any RCU read-side critical section does not work.
Therefore, if there really is a loop here, one fix would be to
periodically unwind back out to run_ksoftirqd(), but setting up so that
the work would be continued later. Another fix might be to move this
from tasklet context to workqueue context, where cond_resched_rcu_qs()
can be used -- however, this looks a bit like networking code, which
does not always take kindly to being run in process context (though
careful use of local_bh_disable() and local_bh_enable() can sometimes
overcome this issue). A third fix, which works only if this code does
not use RCU and does not invoke any code that does use RCU, is to tell
RCU that it should ignore this code (which will require a little work
on RCU, as it currently does not tolerate this sort of thing aside from
the idle threads). In this last approach, event-tracing calls must use
the _nonidle suffix.
I am not familiar with the RDS code, so I cannot be more specific.
>> ------------------------------------------------------[5] LOG 3:
------------------------------------------------------
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by
0, t=240007 jiffies, g=449043, c=449042, q=0)
All QSes seen, last rcu_sched kthread activity 240007
(4299336825-4299096818), jiffies_till_next_fqs=3, root ->qsmask 0x0
ora_lms0_orcltw R running task 0 22303 1
0x00000080
ffff8800c4651c00 00000000e7110062 ffff88010e803c18
ffffffff810b038f
ffff88010e8185c0 ffffffff81b22c40 ffff88010e803c98
ffffffff810e8f96
0000000000000000 ffffffff810b4deb ffff88010e817800
ffff88010e8185c0
Call Trace:
<IRQ> [<ffffffff810b038f>] sched_show_task+0xaf/0x120
[<ffffffff810e8f96>] rcu_check_callbacks+0x7e6/0x7f0
[<ffffffff810b4deb>] ? account_user_time+0x8b/0xa0
[<ffffffff810ee582>] update_process_times+0x42/0x70
[<ffffffff810fea65>] tick_sched_handle.isra.18+0x25/0x60
[<ffffffff810feae4>] tick_sched_timer+0x44/0x80
[<ffffffff810ef307>] __run_hrtimer+0x77/0x1d0
[<ffffffff810feaa0>] ? tick_sched_handle.isra.18+0x60/0x60
[<ffffffff810ef6e3>] hrtimer_interrupt+0x103/0x230
[<ffffffff8100a64d>] xen_timer_interrupt+0x3d/0x170
[<ffffffff814469a9>] ? add_interrupt_randomness+0x49/0x200
[<ffffffff810dd1ae>] handle_irq_event_percpu+0x3e/0x1a0
[<ffffffff810e0cfd>] handle_percpu_irq+0x3d/0x60
[<ffffffff810dc7ab>] generic_handle_irq+0x2b/0x40
[<ffffffff813fe91f>] evtchn_2l_handle_events+0x26f/0x280
[<ffffffff81085bba>] ? __do_softirq+0x18a/0x2d0
[<ffffffff813fbe8f>] __xen_evtchn_do_upcall+0x4f/0x90
[<ffffffff813fddf4>] xen_evtchn_do_upcall+0x34/0x50
[<ffffffff8170f35e>] xen_hvm_callback_vector+0x6e/0x80
<EOI>
rcu_sched kthread starved for 240007 jiffies!
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by
0, t=420012 jiffies, g=449043, c=449042, q=0)
All QSes seen, last rcu_sched kthread activity 420012
(4299516830-4299096818), jiffies_till_next_fqs=3, root ->qsmask 0x0
ora_lms0_orcltw R running task 0 22303 1
0x00000080
ffff8800c4651c00 00000000e7110062 ffff88010e803c18
ffffffff810b038f
ffff88010e8185c0 ffffffff81b22c40 ffff88010e803c98
ffffffff810e8f96
0000000000000000 ffff88010e817800 0000000000017800
ffff88010e8185c0
Call Trace:
<IRQ> [<ffffffff810b038f>] sched_show_task+0xaf/0x120
[<ffffffff810e8f96>] rcu_check_callbacks+0x7e6/0x7f0
[<ffffffff810ee582>] update_process_times+0x42/0x70
[<ffffffff810fea65>] tick_sched_handle.isra.18+0x25/0x60
[<ffffffff810feae4>] tick_sched_timer+0x44/0x80
[<ffffffff810ef307>] __run_hrtimer+0x77/0x1d0
[<ffffffff810feaa0>] ? tick_sched_handle.isra.18+0x60/0x60
[<ffffffff810ef6e3>] hrtimer_interrupt+0x103/0x230
[<ffffffff8100a64d>] xen_timer_interrupt+0x3d/0x170
[<ffffffff814469a9>] ? add_interrupt_randomness+0x49/0x200
[<ffffffff810dd1ae>] handle_irq_event_percpu+0x3e/0x1a0
[<ffffffff810e0cfd>] handle_percpu_irq+0x3d/0x60
[<ffffffff810dc7ab>] generic_handle_irq+0x2b/0x40
[<ffffffff813fe91f>] evtchn_2l_handle_events+0x26f/0x280
[<ffffffff813fbe8f>] __xen_evtchn_do_upcall+0x4f/0x90
[<ffffffff813fddf4>] xen_evtchn_do_upcall+0x34/0x50
[<ffffffff8170f35e>] xen_hvm_callback_vector+0x6e/0x80
<EOI> [<ffffffff81024847>] ?
do_audit_syscall_entry+0x67/0x70
[<ffffffff81025eb3>] syscall_trace_enter_phase1+0x143/0x1a0
[<ffffffff8112ac86>] ? __audit_syscall_exit+0x1e6/0x280
[<ffffffff81026216>] ? syscall_trace_leave+0xc6/0x120
[<ffffffff8170d69a>] tracesys+0xd/0x44
rcu_sched kthread starved for 420012 jiffies!
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by
0, t=600017 jiffies, g=449043, c=449042, q=0)
All QSes seen, last rcu_sched kthread activity 600017
(4299696835-4299096818), jiffies_till_next_fqs=3, root ->qsmask 0x0
ora_lms0_orcltw R running task 0 22303 1
0x00000080
ffff8800c4651c00 00000000e7110062 ffff88010e803c18
ffffffff810b038f
ffff88010e8185c0 ffffffff81b22c40 ffff88010e803c98
ffffffff810e8f96
0000000000000000 ffffffff810b4deb ffff88010e817800
ffff88010e8185c0
Call Trace:
<IRQ> [<ffffffff810b038f>] sched_show_task+0xaf/0x120
[<ffffffff810e8f96>] rcu_check_callbacks+0x7e6/0x7f0
[<ffffffff810b4deb>] ? account_user_time+0x8b/0xa0
[<ffffffff810ee582>] update_process_times+0x42/0x70
[<ffffffff810fea65>] tick_sched_handle.isra.18+0x25/0x60
[<ffffffff810feae4>] tick_sched_timer+0x44/0x80
[<ffffffff810ef307>] __run_hrtimer+0x77/0x1d0
[<ffffffff810feaa0>] ? tick_sched_handle.isra.18+0x60/0x60
[<ffffffff810ef6e3>] hrtimer_interrupt+0x103/0x230
[<ffffffff8100a64d>] xen_timer_interrupt+0x3d/0x170
[<ffffffff814469a9>] ? add_interrupt_randomness+0x49/0x200
[<ffffffff810dd1ae>] handle_irq_event_percpu+0x3e/0x1a0
[<ffffffff810e0cfd>] handle_percpu_irq+0x3d/0x60
[<ffffffff810dc7ab>] generic_handle_irq+0x2b/0x40
[<ffffffff813fe91f>] evtchn_2l_handle_events+0x26f/0x280
[<ffffffff813fbe8f>] __xen_evtchn_do_upcall+0x4f/0x90
[<ffffffff813fddf4>] xen_evtchn_do_upcall+0x34/0x50
[<ffffffff8170f35e>] xen_hvm_callback_vector+0x6e/0x80
<EOI>
rcu_sched kthread starved for 600017 jiffies!