Re: localed stuck in recent 3.18 git in copy_net_ns?

From: Jay Vosburgh
Date: Sat Oct 25 2014 - 14:41:24 EST


Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:

>On Fri, Oct 24, 2014 at 05:20:48PM -0700, Jay Vosburgh wrote:
>> Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> >On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
>> [...]
>> >> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
>> >> need to take a hard look at the sleep/wakeup code.
>> >>
>> >> Thank you for running this!!!
>> >
>> >Could you please try the following patch? If no joy, could you please
>> >add rcu:rcu_nocb_wake to the list of ftrace events?
>>
>> I tried the patch, it did not change the behavior.
>>
>> I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
>> and ran it again (with this patch and the first patch from earlier
>> today); the trace output is a bit on the large side so I put it and the
>> dmesg log at:
>>
>> http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
>>
>> http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
>
>Thank you again!
>
>Very strange part of the trace. The only sign of CPU 2 and 3 are:
>
> ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
> ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
> ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
> ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
> ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
> ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
> ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
> ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
> ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
> ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
> ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
> ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
>
>The pair of WakeNotPoll trace entries says that at that point, RCU believed
>that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/

On the test system I'm using, CPUs 2 and 3 really do not exist;
it is a 2 CPU system (Intel Core 2 Duo E8400). I mentioned this in an
earlier message, but perhaps you missed it in the flurry.

Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,

[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.

but later shows 2:

[ 0.233703] x86: Booting SMP configuration:
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs

In any event, the E8400 is a 2 core CPU with no hyperthreading.

-J

---
-Jay Vosburgh, jay.vosburgh@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/