Re: [Bug #12650] Strange load average and ksoftirqd behavior with2.6.29-rc2-git1
From: Paul E. McKenney
Date: Mon Feb 16 2009 - 13:56:31 EST
On Mon, Feb 16, 2009 at 08:06:13AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 16, 2009 at 02:21:51PM +0100, Ingo Molnar wrote:
> >
> > * Damien Wyart <damien.wyart@xxxxxxx> wrote:
> >
> > > * Ingo Molnar <mingo@xxxxxxx> [090216 13:26]:
> > > > We do get 0x100 which is 1 << RCU_SOFTIRQ, i.e. the RCU softirq. Paul,
> > > > this indeed seems to be a CONFIG_TREE_RCU=y bug.
> > >
> > > > What is weird is that RCU_SOFTIRQ gets set again and again - but there's
> > > > no raise_softirq() calls. Could you please do a two-CPU trace too via:
> > >
> > > > echo 3 > /debug/tracing/tracing_cpumask
> > >
> > > > So that we can see what's happening on the other CPU?
> > >
> > > > Also, could you please apply the debug patch below (or update to the
> > > > very latest -tip tree), so that we get trace entries of softirq triggers
> > > > too?
> > >
> > > Ok, the new trace with these additional modifications is here:
> > > http://damien.wyart.free.fr/ksoftirqd_pb/trace_tip_2009.02.16_1300_ksoftirqd_pb_abstime_proc_mask3.txt.gz
> >
> > thanks.
> >
> > This confirms that SOFTIRQ_RCU gets raised here in the timer IRQ:
> >
> > 136.255963 | 0) sleep-2345 | | update_process_times() {
> > 136.255964 | 0) sleep-2345 | | account_process_tick() {
> > 136.255965 | 0) sleep-2345 | 0.779 us | account_system_time();
> > 136.255966 | 0) sleep-2345 | 2.262 us | }
> > 136.255967 | 0) sleep-2345 | | run_local_timers() {
> > 136.255968 | 0) sleep-2345 | 0.802 us | hrtimer_run_queues();
> > 136.255969 | 0) sleep-2345 | | raise_softirq() {
> > 136.255970 | 0) sleep-2345 | | raise_softirq_irqoff() {
> > 136.255971 | 0) sleep-2345 | | __raise_softirq_irqoff() {
> > 136.255972 | 0) sleep-2345 | | /* nr: 1 */
> > 136.255973 | 0) sleep-2345 | 2.194 us | }
> > 136.255974 | 0) sleep-2345 | 3.832 us | }
> > 136.255975 | 0) sleep-2345 | 5.491 us | }
> > 136.255976 | 0) sleep-2345 | 8.667 us | }
> > 136.255976 | 0) sleep-2345 | 0.792 us | rcu_pending();
> > 136.255978 | 0) sleep-2345 | | rcu_check_callbacks() {
> > 136.255979 | 0) sleep-2345 | 0.781 us | idle_cpu();
> > 136.255981 | 0) sleep-2345 | | raise_softirq() {
> > 136.255981 | 0) sleep-2345 | | raise_softirq_irqoff() {
> > 136.255982 | 0) sleep-2345 | | __raise_softirq_irqoff() {
> > 136.255983 | 0) sleep-2345 | | /* nr: 8 */
> > 136.255984 | 0) sleep-2345 | 1.555 us | }
> > 136.255984 | 0) sleep-2345 | 3.059 us | }
> > 136.255985 | 0) sleep-2345 | 4.594 us | }
> > 136.255986 | 0) sleep-2345 | 7.800 us | }
> > 136.255987 | 0) sleep-2345 | 0.737 us | printk_tick();
> >
> > again and again.
>
> Interesting...
>
> I will take a look!
The above sequence is more or less normal behavior -- the RCU softirq
handler rcu_process_callbacks() is being invoked once per tick, which
appears to be HZ=1000 or thereabouts. The system appears to be mostly
idle during this time period.
One oddity is that the _bh call to __rcu_process_callbacks() is invoking
force_quiescent_state() each time, and force_quiescent_state() isn't
doing anything. This is a possible mismatch between the conditions in
rcu_pending() and force_quiescent_state(), and I will look into this.
However, this sequence is consuming less than 10 microseconds per
millisecond, so cannot be the main cause of the softirq issues you
are seeing, though if there really is a mismatch, it needs to be fixed,
and I will attend to this.
The interesting portion of the trace is later on:
137.896992 | 1) ksoftir-2302 | | do_softirq() {
137.896993 | 1) ksoftir-2302 | | __do_softirq() {
137.896993 | 1) ksoftir-2302 | | /* #1 softirq pending: 00000100 */
137.896994 | 1) ksoftir-2302 | | /* #2 softirq pending: 00000000 */
137.896995 | 1) ksoftir-2302 | | rcu_process_callbacks() {
137.896995 | 1) ksoftir-2302 | | __rcu_process_callbacks() {
137.896996 | 1) ksoftir-2302 | 0.498 us | force_quiescent_state();
137.896997 | 1) ksoftir-2302 | 1.588 us | }
137.896997 | 1) ksoftir-2302 | | __rcu_process_callbacks() {
137.896998 | 1) ksoftir-2302 | 0.475 us | force_quiescent_state();
137.896999 | 1) ksoftir-2302 | | cpu_quiet() {
137.896999 | 1) ksoftir-2302 | 0.526 us | _spin_lock_irqsave();
137.897000 | 1) ksoftir-2302 | 0.511 us | _spin_unlock_irqrestore();
137.897001 | 1) ksoftir-2302 | 2.528 us | }
137.897002 | 1) ksoftir-2302 | 4.607 us | }
137.897002 | 1) ksoftir-2302 | 7.825 us | }
137.897003 | 1) ksoftir-2302 | 0.498 us | _local_bh_enable();
137.897004 | 1) ksoftir-2302 | + 11.430 us | }
137.897005 | 1) ksoftir-2302 | + 12.572 us | }
137.897005 | 1) ksoftir-2302 | 0.549 us | _cond_resched();
137.897006 | 1) ksoftir-2302 | 0.541 us | kthread_should_stop();
137.897007 | 1) ksoftir-2302 | | schedule() {
137.897008 | 1) ksoftir-2302 | | __schedule() {
137.897008 | 1) ksoftir-2302 | 0.514 us | _spin_lock_irq();
137.897009 | 1) ksoftir-2302 | 0.594 us | update_rq_clock();
137.897011 | 1) ksoftir-2302 | | deactivate_task() {
137.897011 | 1) ksoftir-2302 | | dequeue_task() {
137.897011 | 1) ksoftir-2302 | | dequeue_task_fair() {
137.897012 | 1) ksoftir-2302 | | update_curr() {
137.897012 | 1) ksoftir-2302 | | calc_delta_fair() {
137.897013 | 1) ksoftir-2302 | 0.506 us | calc_delta_mine();
137.897014 | 1) ksoftir-2302 | 1.528 us | }
137.897015 | 1) ksoftir-2302 | 2.563 us | }
137.897015 | 1) ksoftir-2302 | 0.513 us | hrtick_start_fair();
137.897019 | 1) ksoftir-2302 | 4.662 us | }
137.897019 | 1) ksoftir-2302 | 8.213 us | }
137.897020 | 1) ksoftir-2302 | 9.195 us | }
137.897020 | 1) ksoftir-2302 | 0.960 us | find_busiest_group();
137.897022 | 1) ksoftir-2302 | 0.493 us | msecs_to_jiffies();
137.897023 | 1) ksoftir-2302 | 0.511 us | put_prev_task_fair();
137.897024 | 1) ksoftir-2302 | | pick_next_task() {
137.897024 | 1) ksoftir-2302 | 0.481 us | pick_next_task_fair();
137.897025 | 1) ksoftir-2302 | 0.491 us | pick_next_task_rt();
137.897026 | 1) ksoftir-2302 | 0.474 us | pick_next_task_fair();
137.897027 | 1) ksoftir-2302 | 0.480 us | pick_next_task_idle();
137.897028 | 1) ksoftir-2302 | 4.516 us | }
137.897029 | 1) ksoftir-2302 | | perf_counter_task_sched_out() {
137.897029 | 1) ksoftir-2302 | | __perf_counter_sched_out() {
137.897030 | 1) ksoftir-2302 | 0.516 us | _spin_lock();
137.897031 | 1) ksoftir-2302 | 1.486 us | }
137.897031 | 1) ksoftir-2302 | 2.462 us | }
137.897032 | 1) ksoftir-2302 | 0.516 us | __lock_text_start();
137.897045 | ------------------------------------------
1) ksoftir-2302 => <idle>-0
------------------------------------------
1) <idle>-0 | | /* nr: 8 */
------------------------------------------
1) <idle>-0 => ksoftir-2302
------------------------------------------
137.897064 | 1) ksoftir-2302 | | finish_task_switch() {
137.897064 | 1) ksoftir-2302 | | perf_counter_task_sched_in() {
137.897065 | 1) ksoftir-2302 | 0.508 us | _spin_lock();
137.897066 | 1) ksoftir-2302 | 1.525 us | }
137.897066 | 1) ksoftir-2302 | 2.617 us | }
137.897067 | 1) ksoftir-2302 | + 58.928 us | }
137.897067 | 1) ksoftir-2302 | + 59.926 us | }
137.897068 | 1) ksoftir-2302 | | do_softirq() {
137.897068 | 1) ksoftir-2302 | | __do_softirq() {
137.897069 | 1) ksoftir-2302 | | /* #1 softirq pending: 00000100 */
137.897070 | 1) ksoftir-2302 | | /* #2 softirq pending: 00000000 */
137.897070 | 1) ksoftir-2302 | | rcu_process_callbacks() {
137.897071 | 1) ksoftir-2302 | | __rcu_process_callbacks() {
137.897071 | 1) ksoftir-2302 | | force_quiescent_state() {
137.897073 | 1) ksoftir-2302 | 1.575 us | }
137.897073 | 1) ksoftir-2302 | | __rcu_process_callbacks() {
137.897074 | 1) ksoftir-2302 | 0.474 us | force_quiescent_state();
137.897075 | 1) ksoftir-2302 | | cpu_quiet() {
137.897075 | 1) ksoftir-2302 | 0.526 us | _spin_lock_irqsave();
137.897076 | 1) ksoftir-2302 | 0.511 us | _spin_unlock_irqrestore();
137.897077 | 1) ksoftir-2302 | 2.532 us | }
137.897078 | 1) ksoftir-2302 | 4.632 us | }
137.897078 | 1) ksoftir-2302 | 7.815 us | }
137.897079 | 1) ksoftir-2302 | 0.501 us | _local_bh_enable();
137.897080 | 1) ksoftir-2302 | + 11.405 us | }
137.897080 | 1) ksoftir-2302 | + 12.542 us | }
Here the calls to rcu_process_callbacks() are only 75 microseconds apart,
so that this function is consuming more than 10% of a CPU. The strange
thing is that I don't see a raise_softirq() in between, though perhaps
it gets inlined or something that makes it invisible to ftrace.
Certainly rcu_process_callbacks() can re-invoke itself, for example,
when a large number of RCU callbacks has piled up. However, there are
only 29 calls to __call_rcu() over the entire time period, so that does
not appear to be the cause. Strangely enough, there appear to be no
calls to rcu_do_batch() over the full trace, but this is invoked
unconditionally from __rcu_process_callbacks(). So perhaps the trace
wasn't covering that function?
Whatever, this pattern continues for more than 300 milliseconds(!).
Would you be willing to enable CONFIG_RCU_TRACE and CONFIG_TREE_RCU,
reproduce this and send the output of the debugfs files rcu/rcudata
and rcu/rcuhier? The commands for this would be:
mkdir /debug || :
mount -t debugfs debugfs /debug
cat /debug/rcu/rcuhier
cat /debug/rcu/rcudata
I will try to reproduce locally as well.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/