Re: linux-next: Tree for April 14 (Call-traces: RCU/ACPI/WQ related?)

From: Sedat Dilek
Date: Tue Apr 26 2011 - 08:50:30 EST


On Tue, Apr 26, 2011 at 2:42 PM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Apr 26, 2011 at 01:45:31PM +0200, Sedat Dilek wrote:
>> On Tue, Apr 26, 2011 at 7:06 AM, Paul E. McKenney
>> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> > On Sun, Apr 24, 2011 at 09:43:31AM -0700, Paul E. McKenney wrote:
>> >> On Sun, Apr 24, 2011 at 11:36:44AM +0200, Sedat Dilek wrote:
>> >> > On Sun, Apr 24, 2011 at 8:27 AM, Paul E. McKenney
>> >> > <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> >>
>> >> [ . . . ]
>> >>
>> >> > > OK, this looks unrelated, but just in case, could you please try it
>> >> > > again with the following patch? Â(Not mainlinable, debug only.)
>> >> > >
>> >> > > Also, it does look like you are still seeing a grace-period hang.
>> >> > > Could you please send the output of the script? ÂSame one as last time.
>> >> > >
>> >> > > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>> >> > >
>> >> > > ------------------------------------------------------------------------
>> >> > >
>> >> > > Âdebugobjects.c | Â Â8 +++++---
>> >> > > Â1 file changed, 5 insertions(+), 3 deletions(-)
>> >> > >
>> >> > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>> >> > > index 9d86e45..10a7c7a 100644
>> >> > > --- a/lib/debugobjects.c
>> >> > > +++ b/lib/debugobjects.c
>> >> > > @@ -289,10 +289,12 @@ static void debug_object_is_on_stack(void *addr, int onstack)
>> >> > > Â Â Â Â Â Â Â Âreturn;
>> >> > >
>> >> > > Â Â Â Âlimit++;
>> >> > > - Â Â Â if (is_on_stack)
>> >> > > + Â Â Â if (is_on_stack) {
>> >> > > + Â Â Â Â Â Â Â struct rcu_head *p = (struct rcu_head *)addr;
>> >> > > Â Â Â Â Â Â Â Âprintk(KERN_WARNING
>> >> > > - Â Â Â Â Â Â Â Â Â Â Â"ODEBUG: object is on stack, but not annotated\n");
>> >> > > - Â Â Â else
>> >> > > + Â Â Â Â Â Â Â Â Â Â Â"ODEBUG: object is on stack, but not annotated: %p\n",
>> >> > > + Â Â Â Â Â Â Â Â Â Â Âp->func);
>> >> > > + Â Â Â } else
>> >> > > Â Â Â Â Â Â Â Âprintk(KERN_WARNING
>> >> > > Â Â Â Â Â Â Â Â Â Â Â "ODEBUG: object is not on stack, but annotated\n");
>> >> > > Â Â Â ÂWARN_ON(1);
>> >> > >
>> >> >
>> >> > Somehow your attached patch was not applicable.
>> >> > As the changes were a few lines I applied it by myself.
>> >> > Attached are log, dmesg and patches (orig + mine)
>> >>
>> >> Hmmm... ÂDoes 0xc10231a1 correspond to a function in your build? ÂIf so,
>> >> could you please let me know which one?
>> >>
>> >> OK, so according to "ps" the per-CPU kthread is runnable, but it appears
>> >> to never run. ÂYou only have one CPU, so it cannot be waiting due to
>> >> running on the wrong CPU. ÂThe only other loop is in wait_event(), and
>> >> that code looks good -- besides, if wait_event() was broken, we would
>> >> be seeing breakage everywhere.
>> >>
>> >> Peter, any thoughts on what I might have done wrong to get the scheduler
>> >> into a state where it was ignoring a runnable realtime task?
>> >
>> > Hello, Sedat,
>> >
>> > Here is a diagnostic patch to apply on top of sedat.2011.04.23a from
>> > the -rcu git tree. ÂCould you please try it out, let me know what
>> > happens, and run the last collectdebugfs.sh during the test?
>> >
>> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>> >
>> > ------------------------------------------------------------------------
>> >
>> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>> > index 6cf6e47..65ae701 100644
>> > --- a/kernel/rcutree.c
>> > +++ b/kernel/rcutree.c
>> > @@ -1524,9 +1524,9 @@ static void rcu_cpu_kthread_setrt(int cpu, int to_rt)
>> > Â Â Â Â Â Â Â Âreturn;
>> > Â Â Â Âif (to_rt) {
>> > Â Â Â Â Â Â Â Âpolicy = SCHED_NORMAL;
>> > - Â Â Â Â Â Â Â sp.sched_priority = RCU_KTHREAD_PRIO;
>> > + Â Â Â Â Â Â Â sp.sched_priority = 0;
>> > Â Â Â Â} else {
>> > - Â Â Â Â Â Â Â policy = SCHED_FIFO;
>> > + Â Â Â Â Â Â Â policy = SCHED_NORMAL;
>> > Â Â Â Â Â Â Â Âsp.sched_priority = 0;
>> > Â Â Â Â}
>> > Â Â Â Âsched_setscheduler_nocheck(t, policy, &sp);
>> > @@ -1566,8 +1566,8 @@ static void rcu_yield(void (*f)(unsigned long), unsigned long arg)
>> > Â Â Â Âsp.sched_priority = 0;
>> > Â Â Â Âsched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
>> > Â Â Â Âschedule();
>> > - Â Â Â sp.sched_priority = RCU_KTHREAD_PRIO;
>> > - Â Â Â sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
>> > + Â Â Â sp.sched_priority = 0;
>> > + Â Â Â sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
>> > Â Â Â Âdel_timer(&yield_timer);
>> > Â}
>> >
>> > @@ -1671,8 +1671,8 @@ static int __cpuinit rcu_spawn_one_cpu_kthread(int cpu)
>> > Â Â Â ÂWARN_ON_ONCE(per_cpu(rcu_cpu_kthread_task, cpu) != NULL);
>> > Â Â Â Âper_cpu(rcu_cpu_kthread_task, cpu) = t;
>> > Â Â Â Âwake_up_process(t);
>> > - Â Â Â sp.sched_priority = RCU_KTHREAD_PRIO;
>> > - Â Â Â sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
>> > + Â Â Â sp.sched_priority = 0;
>> > + Â Â Â sched_setscheduler_nocheck(t, SCHED_NORMAL, &sp);
>> > Â Â Â Âreturn 0;
>> > Â}
>> >
>> > @@ -1713,8 +1713,8 @@ static int rcu_node_kthread(void *arg)
>> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âcontinue;
>> > Â Â Â Â Â Â Â Â Â Â Â Â}
>> > Â Â Â Â Â Â Â Â Â Â Â Âper_cpu(rcu_cpu_has_work, cpu) = 1;
>> > - Â Â Â Â Â Â Â Â Â Â Â sp.sched_priority = RCU_KTHREAD_PRIO;
>> > - Â Â Â Â Â Â Â Â Â Â Â sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
>> > + Â Â Â Â Â Â Â Â Â Â Â sp.sched_priority = 0;
>> > + Â Â Â Â Â Â Â Â Â Â Â sched_setscheduler_nocheck(t, SCHED_NORMAL, &sp);
>> > Â Â Â Â Â Â Â Â Â Â Â Âpreempt_enable();
>> > Â Â Â Â Â Â Â Â}
>> > Â Â Â Â}
>> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
>> > index a21413d..baee185 100644
>> > --- a/kernel/rcutree_plugin.h
>> > +++ b/kernel/rcutree_plugin.h
>> > @@ -1307,8 +1307,8 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
>> > Â Â Â Ârnp->boost_kthread_task = t;
>> > Â Â Â Âraw_spin_unlock_irqrestore(&rnp->lock, flags);
>> > Â Â Â Âwake_up_process(t);
>> > - Â Â Â sp.sched_priority = RCU_KTHREAD_PRIO;
>> > - Â Â Â sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
>> > + Â Â Â sp.sched_priority = 0;
>> > + Â Â Â sched_setscheduler_nocheck(t, SCHED_NORMAL, &sp);
>> > Â Â Â Âreturn 0;
>> > Â}
>> >
>> >
>>
>> Hi Paul,
>>
>> I have tested with your patch and kept the kernel-config file from
>> previous tests (don't get confused by the new name).
>> Hope this helps you.
>>
>> I have some questions to k-c options espcially X86_UP and
>> CONFIG_RCU_FANOUT=32 options.
>> To what extent can they influence our RCU issue?
>> The below options were not set for this round of testing, but I would
>> like to have a feedback.
>> Thanks in advance.
>>
>> Would these settings be more optimal for a UP-machine?
>>
>> # CONFIG_SMP is not set
>> # CONFIG_M486 is not set
>> CONFIG_M686=y
>> CONFIG_NR_CPUS=1
>
> These should be fine.
>
>> CONFIG_X86_UP_APIC=y
>> CONFIG_X86_UP_IOAPIC=y
>
> These I don't know about.
>
>> CONFIG_HIGHMEM4G=y
>
> This one seems good for allowing the system to go as long as possible.
>
>> Is CONFIG_RCU_FANOUT=32 OK?
>
> On a UP system, this one doesn't matter.
>
>> With reverting commit 687d7a960aea46e016182c7ce346d62c4dbd0366 ("rcu:
>> restrict TREE_RCU to SMP builds with !PREEMPT").
>
> Thank you for trying this one out!
>
> I don't see any sign of a grace-period hang. ÂDid your test complete
> correctly?
>
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>

Thanks for the comments.

I let run the script very long (approx. one hour) and did parallelly
my daily work.
Then booted into a known as working kernel.
Did I miss something, should I stress more?

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/