Re: perf: aux area related crash and warnings

From: Alexander Shishkin
Date: Tue Jun 16 2015 - 07:37:21 EST


Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx> writes:

> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>
>> Alex, any clue?
>
> Let me look into it. Definitely haven't seen anything like that in my
> tests.
>
>> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
>>> On Thu, 11 Jun 2015, Vince Weaver wrote:
>>>
>>> > and while I was trying to cut and paste that, the locked haswell just took
>>> > down the network switch so I can't get the rest until tomorrow.
>>>
>>> here are the full dumps if anyone is interested
>>>
>>> the warning is reproducible, the spinlock disaster is not.
>>>
>>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
>>> [36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
>>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
>>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>>> [36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
>>> [36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
>>> [36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
>>> [36299.046338] Call Trace:
>>> [36299.049501] <NMI> [<ffffffff816d7229>] dump_stack+0x45/0x57
>>> [36299.056284] [<ffffffff810c2895>] spin_dump+0x85/0xe0
>>> [36299.062282] [<ffffffff810c2916>] spin_bug+0x26/0x30
>>> [36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
>>> [36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
>>> [36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
>>> [36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
>>> [36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
>>> [36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
>>> [36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
>>> [36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50
>>> [36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
>
> This one goes to free aux pages from nmi context, looks like aux buffer
> was unmapped while the event was running, so here it dropped the last
> reference.

Ok, here's what I propose for this one.