Re: Linux 3.18 released

From: Vince Weaver
Date: Mon Dec 08 2014 - 13:38:24 EST


On Sun, 7 Dec 2014, Linus Torvalds wrote:

> I'd love to say that we've figured out the problem that plagues 3.17
> for a couple of people, but we haven't. At the same time, there's
> absolutely no point in having everybody else twiddling their thumbs
> when a couple of people are actively trying to bisect an older issue,
> so holding up the release just didn't make sense. Especially since
> that would just have then held things up entirely over the holiday
> break.
>
> So the merge window for 3.19 is open, and DaveJ will hopefully get his
> bisection done (or at least narrow things down sufficiently that we
> have that "Ahaa" moment) over the next week. But in solidarity with
> Dave (and to make my life easier too ;) let's try to avoid introducing
> any _new_ nasty issues, ok?

It's probably unrelated to DaveJ's issue, but my perf_event fuzzer still
quickly locks the kernel pretty solid on 3.18.

Just 5 minutes of testing managed to trip over the following issue that
dates back to at least 3.15-rc7

My notes say last time I tracked down the issue as so:

What happens is in kernel/core/events.c find_get_context()
somehow perf_lock_task_context() returns NULL
due to !atomic_inc_not_zero(&ctx->refcount)
but task->perf_event_ctxp[ctxn] still has a valid value.

There are multiple perf related issues like this that are hard to track
down. They are borderline heisenbugs that are possibly race conditions,
so bisecting doesn't work and even things like enablibg ftrace will make
the issue go away (or crash ftrace itself).

This particular manifestation of the bug (or bugs) wedges things but I can
use alt-sysrq from the serial console to see where it is stuck (see
below; the CPU is stuck in a loop).


[ 2225.916004] [<ffffffff810e61e9>] ? get_page_from_freelist+0x55/0x781
[ 2225.916004] [<ffffffff810e6a7c>] __alloc_pages_nodemask+0x167/0x6dc
[ 2225.916004] [<ffffffff8101a4a3>] ? intel_pmu_enable_all+0x28/0xa4
[ 2225.916004] [<ffffffff8111f0b3>] kmem_getpages+0x58/0xec
[ 2225.916004] [<ffffffff81120278>] cache_grow+0xad/0x1d8
[ 2225.916004] [<ffffffff81120021>] ____cache_alloc+0x237/0x2ce
[ 2225.916004] [<ffffffff811216b9>] __kmalloc+0x8f/0xf2
[ 2225.916004] [<ffffffff810dc35d>] ? T.1336+0xe/0x10
[ 2225.916004] [<ffffffff810dc35d>] T.1336+0xe/0x10
[ 2225.916004] [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
[ 2225.916004] [<ffffffff810dca33>] find_get_context+0x138/0x1c7
[ 2225.916004] [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
[ 2225.916004] [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
[ 2225.916004] [<ffffffff81560016>] system_call_fastpath+0x16/0x1b

[ 2256.708004] [<ffffffff810d7e36>] ? put_ctx+0x40/0x61
[ 2256.708004] [<ffffffff810dcaa4>] find_get_context+0x1a9/0x1c7
[ 2256.708004] [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
[ 2256.708004] [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
[ 2256.708004] [<ffffffff81560016>] system_call_fastpath+0x16/0x1b

[ 2303.796003] [<ffffffff810fa6cb>] ? kmalloc_slab+0x7f/0x8d
[ 2303.796003] [<ffffffff81121653>] __kmalloc+0x29/0xf2
[ 2303.796003] [<ffffffff810dc35d>] ? T.1336+0xe/0x10
[ 2303.796003] [<ffffffff810dc35d>] T.1336+0xe/0x10
[ 2303.796003] [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
[ 2303.796003] [<ffffffff810dca33>] find_get_context+0x138/0x1c7
[ 2303.796003] [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
[ 2303.796003] [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
[ 2303.796003] [<ffffffff81560016>] system_call_fastpath+0x16/0x1b

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/