Re: [PATCH v6 2/2] arm64: enable context tracking
From: Kevin Hilman
Date: Tue Jun 03 2014 - 13:34:43 EST
Will Deacon <will.deacon@xxxxxxx> writes:
> Hi guys,
>
> On Fri, May 30, 2014 at 08:08:38PM +0100, Kevin Hilman wrote:
>> Will Deacon <will.deacon@xxxxxxx> writes:
>> > I'd like to give these some stress testing before it gets merged, so I'm
>> > not sure if it'll make it for 3.16 given where we are at the moment.
>>
>> FWIW, this feature is disabled by default. I use the following kconfig
>> fragment to enable the various parts I use for testing:
>>
>> CONFIG_NO_HZ=y
>> CONFIG_NO_HZ_FULL=y
>> CONFIG_NO_HZ_FULL_ALL=y
>> CONFIG_NO_HZ_FULL_SYSIDLE=y
>>
>> # default to power-efficient workqueues (which are then set to unbound)
>> CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
>>
>> # lockup detector sets a 4s timer on every CPU, which wakes CPUs
>> # from idle. (alternately, can be controlled via procfs,
>> # e.g: echo 0 > /proc/sys/kernel/watchdog)
>> #CONFIG_LOCKUP_DETECTOR=n
>
> I had a go with this, but I couldn't seem to trigger any context tracking
> without forcing CONFIG_CONTEXT_TRACKING_FORCE=y. Does that mean we're
> missing something else?
No, it just means that you never hit the conditions to trigger full
NOHZ. Using _FORCE is a good way to do that since it forces the context
tracking paths whether or not it's actually needed by full NOHZ.
> Anyway, with that forced on, I see the following during boot:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:418 rcu_eqs_enter+0x84/0xa4()
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc8+ #5
> Call trace:
> [<ffffffc000088048>] dump_backtrace+0x0/0x130
> [<ffffffc000088188>] show_stack+0x10/0x1c
> [<ffffffc0004891a0>] dump_stack+0x74/0xbc
> [<ffffffc0000a45e0>] warn_slowpath_common+0x8c/0xb4
> [<ffffffc0000a46cc>] warn_slowpath_null+0x14/0x20
> [<ffffffc0000efc14>] rcu_eqs_enter+0x80/0xa4
> [<ffffffc0000efc58>] rcu_idle_enter+0x20/0x50
> [<ffffffc0000dd314>] cpu_startup_entry+0x118/0x184
> [<ffffffc0004865ec>] rest_init+0x7c/0x88
> [<ffffffc000609800>] start_kernel+0x368/0x37c
> ---[ end trace c17313e162496e65 ]---
So this suggests that we've told RCU that we've entered userspace twice,
without having left (the context tracker is an extention of the RCU
extended quiscent state machinery.)
So after I was able to reproduce this (after some IRC discussion with
Will, and using full ubuntu rootfs and CONFIG_CONTEXT_TRACKING_FORCE=y)
I think I found the bug.
Basically, the problem is that we have a ct_user_exit in el1_irq
(interrupt in kernel space) when it should be in el0_irq (interrupt in
user space.)
Moving the ct_user_exit into el0_irq, I'm not able to see the problem.
Larry, could you sanity check that and respin a v8 with that change if
it works for you?
Thanks,
Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/