Re: [Bug #12650] Strange load average and ksoftirqd behavior with2.6.29-rc2-git1

From: Frederic Weisbecker
Date: Sun Feb 15 2009 - 09:06:59 EST


On Sun, Feb 15, 2009 at 12:01:04PM +0100, Ingo Molnar wrote:
>
> * Damien Wyart <damien.wyart@xxxxxxx> wrote:
>
> > So I followed the tracing steps in the tutorial (with the 1 sec sleep),
> > which gave me this:
> > http://damien.wyart.free.fr/trace_2.6.29-rc5_ksoftirqd_prob.txt.gz
>
> thanks. There's definitely some weirdness visible in the trace,
> for example:
>
> 0) gpm-1879 => ksoftir-4
> ------------------------------------------
>
> 0) 0.964 us | finish_task_switch();
> 0) ! 1768184 us | }
> 0) | do_softirq() {
> 0) | __do_softirq() {
> 0) | rcu_process_callbacks() {
>
> the 1.7 seconds 'overhead' there must be a fluke - you'd notice it if
> ksoftirqd _really_ took that much time to execute.
>
> One possibility for these symptoms would be broken scheduler timestamps.
> Could you enable absolute timestamp printing via:
>
> echo funcgraph-abstime > trace_options
>
> Also, my guess is that if you boot via idle=poll, the symptoms go away.
> This would strengthen the suspicion that it's scheduler-clock troubles.
>
> Ingo

Looking at the following ksoftirqd quantum, there are some things I don't understand:

------------------------------------------
0) gpm-1879 => ksoftir-4
------------------------------------------

0) 0.681 us | finish_task_switch();
0) ! 7670.083 us | }
0) | do_softirq() {
0) | __do_softirq() {
0) | rcu_process_callbacks() {
0) | __rcu_process_callbacks() {
0) | cpu_quiet() {
0) 0.509 us | _spin_lock_irqsave();
0) | cpu_quiet_msk() {
0) 0.503 us | _spin_unlock_irqrestore();
0) 1.546 us | }
0) 3.617 us | }
0) 4.804 us | }
0) | __rcu_process_callbacks() {
0) 0.496 us | force_quiescent_state();
0) 1.608 us | }
0) 8.043 us | }
0) 0.533 us | _local_bh_enable();
0) + 10.190 us | }
0) + 11.337 us | }
0) 0.537 us | _cond_resched();
0) 0.536 us | kthread_should_stop();
0) | schedule() {
0) 0.516 us | _spin_lock_irq();
0) 0.586 us | update_rq_clock();
0) | deactivate_task() {
0) | dequeue_task() {
0) | dequeue_task_fair() {
0) | update_curr() {
0) 0.519 us | calc_delta_mine();
0) 1.603 us | }
0) 2.667 us | }
0) 3.662 us | }
0) 4.629 us | }
0) 0.937 us | find_busiest_group();
0) 0.493 us | msecs_to_jiffies();
0) 0.501 us | put_prev_task_fair();
0) | pick_next_task() {
0) 0.501 us | pick_next_task_fair();
0) 0.491 us | pick_next_task_rt();
0) 0.494 us | pick_next_task_fair();
0) 0.491 us | pick_next_task_idle();
0) 4.591 us | }
0) 0.501 us | __lock_text_start();
0) 0.699 us | finish_task_switch();
0) ! 289.895 us | }


The two above is the internal of context_switch, which means it entered
there in schedule():

next = pick_next_task(rq, prev);
if (likely(prev != next)) {
sched_info_switch(prev, next);
perf_counter_task_sched_out(prev, cpu);

rq->nr_switches++;
rq->curr = next;
++*switch_count;

context_switch(rq, prev, next); /* unlocks the rq */
/*
* the context switch might have flipped the stack from under
* us, hence refresh the local variables.
*/
cpu = smp_processor_id();
rq = cpu_rq(cpu);
}

But after that we are still in ksoftirqd. As if it bypassed the if (prev != next)
and rescheduled itself.

Well, I guess it's a bit off topic here. Whatever if this a bug in the sched_clock(),
it reminds me a possible problem in sched_clock() I saw with tracers...



0) | do_softirq() {
0) | __do_softirq() {
0) | rcu_process_callbacks() {
0) | __rcu_process_callbacks() {
0) | force_quiescent_state() {
0) 0.478 us | __lock_text_start();
0) 0.489 us | _spin_lock();
0) | rcu_process_dyntick() {
0) 0.496 us | _spin_lock_irqsave();
0) 0.571 us | dyntick_save_progress_counter();
0) | cpu_quiet_msk() {
0) | rcu_start_gp() {
0) 0.496 us | _spin_unlock_irqrestore();
0) 1.533 us | }
0) 2.588 us | }
0) 5.784 us | }
0) 0.493 us | _spin_lock();
0) 0.501 us | _spin_unlock_irqrestore();
0) + 10.959 us | }
0) | file_free_rcu() {
0) 0.551 us | kmem_cache_free();
0) 1.613 us | }
0) + 14.486 us | }
0) | __rcu_process_callbacks() {
0) 0.458 us | force_quiescent_state();
0) 1.573 us | }
0) + 17.661 us | }
0) 0.486 us | _local_bh_enable();
0) + 20.014 us | }
0) + 21.094 us | }
0) 0.546 us | _cond_resched();
0) 0.536 us | kthread_should_stop();
0) | schedule() {
0) 0.514 us | _spin_lock_irq();
0) 0.596 us | update_rq_clock();
0) | deactivate_task() {
0) | dequeue_task() {
0) | dequeue_task_fair() {
0) | update_curr() {
0) 0.500 us | calc_delta_mine();
0) 1.570 us | }
0) 2.632 us | }
0) 3.632 us | }
0) 4.594 us | }
0) 0.846 us | find_busiest_group();
0) 0.471 us | msecs_to_jiffies();
0) 0.506 us | put_prev_task_fair();
0) | pick_next_task() {
0) 0.486 us | pick_next_task_fair();
0) 0.481 us | pick_next_task_rt();
0) 0.484 us | pick_next_task_fair();
0) 0.493 us | pick_next_task_idle();
0) 4.544 us | }
0) 0.506 us | __lock_text_start();
0) 0.554 us | finish_task_switch();
0) + 30.202 us | }
0) | do_softirq() {
0) | __do_softirq() {
0) | rcu_process_callbacks() {
0) | __rcu_process_callbacks() {
0) | cpu_quiet() {
0) 0.493 us | _spin_lock_irqsave();
0) | cpu_quiet_msk() {
0) 0.504 us | _spin_unlock_irqrestore();
0) 1.515 us | }
0) 3.535 us | }
0) 4.692 us | }
0) | __rcu_process_callbacks() {
0) 0.480 us | force_quiescent_state();
0) 1.593 us | }
0) 7.878 us | }
0) 0.491 us | _local_bh_enable();
0) 9.934 us | }
0) + 11.037 us | }
0) 0.529 us | _cond_resched();
0) 0.506 us | kthread_should_stop();
0) | schedule() {
0) 0.499 us | _spin_lock_irq();
0) 0.581 us | update_rq_clock();
0) | deactivate_task() {
0) | dequeue_task() {
0) | dequeue_task_fair() {
0) | update_curr() {
0) 0.493 us | calc_delta_mine();
0) 1.548 us | }
0) 2.583 us | }
0) 3.559 us | }
0) 4.514 us | }
0) 0.797 us | find_busiest_group();
0) 0.481 us | msecs_to_jiffies();
0) 0.499 us | put_prev_task_fair();
0) | pick_next_task() {
0) 0.481 us | pick_next_task_fair();
0) 0.479 us | pick_next_task_rt();
0) 0.483 us | pick_next_task_fair();
0) 0.481 us | pick_next_task_idle();
0) 4.488 us | }
0) 0.506 us | __lock_text_start();
0) 0.672 us | finish_task_switch();
0) ! 928.567 us | }
0) | do_softirq() {
0) | __do_softirq() {
0) | rcu_process_callbacks() {
0) | __rcu_process_callbacks() {
0) | force_quiescent_state() {
0) 0.484 us | __lock_text_start();
0) 0.511 us | _spin_lock();
0) | rcu_process_dyntick() {
0) 0.511 us | _spin_lock_irqsave();
0) 0.564 us | dyntick_save_progress_counter();
0) | cpu_quiet_msk() {
0) | rcu_start_gp() {
0) 0.503 us | _spin_unlock_irqrestore();
0) 1.548 us | }
0) 2.582 us | }
0) 6.102 us | }
0) 0.501 us | _spin_lock();
0) 0.501 us | _spin_unlock_irqrestore();
0) + 11.265 us | }
0) | file_free_rcu() {
0) 0.579 us | kmem_cache_free();
0) 1.643 us | }
0) | file_free_rcu() {
0) 0.549 us | kmem_cache_free();
0) 1.588 us | }
0) + 16.827 us | }
0) | __rcu_process_callbacks() {
0) 0.479 us | force_quiescent_state();
0) 1.593 us | }
0) + 20.024 us | }
0) 0.500 us | _local_bh_enable();
0) + 22.081 us | }
0) + 23.163 us | }
0) 0.523 us | _cond_resched();
0) 0.536 us | kthread_should_stop();
0) | schedule() {
0) 0.506 us | _spin_lock_irq();
0) 0.586 us | update_rq_clock();
0) | deactivate_task() {
0) | dequeue_task() {
0) | dequeue_task_fair() {
0) | update_curr() {
0) 0.516 us | calc_delta_mine();
0) 1.578 us | }
0) 2.628 us | }
0) 3.622 us | }
0) 4.604 us | }
0) 0.917 us | find_busiest_group();
0) 0.484 us | msecs_to_jiffies();
0) 0.501 us | put_prev_task_fair();
0) | pick_next_task() {
0) 0.486 us | pick_next_task_fair();
0) 0.491 us | pick_next_task_rt();
0) | pick_next_task_fair() {
0) 0.486 us | pick_next_task_idle();
0) 4.581 us | }
0) 0.496 us | __lock_text_start();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/