Re: [BUG] perf and kmemcheck : fatal combination
From: Pekka Enberg
Date: Tue Apr 26 2011 - 06:08:43 EST
On Tue, Apr 26, 2011 at 12:53 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> Le mardi 26 avril 2011 à 10:57 +0200, Eric Dumazet a écrit :
>> Le mardi 26 avril 2011 à 10:04 +0200, Ingo Molnar a écrit :
>>
>> > Eric, does it manage to limp along if you remove the BUG_ON()?
>> >
>> > That risks NMI recursion but maybe it allows you to see why things are slow,
>> > before it crashes ;-)
>> >
>>
>> If I remove the BUG_ON from nmi_enter, it seems to crash very fast
>
> Before you ask, some more complete netconsole traces :
>
> [ 306.657192] ------------[ cut here ]------------
> [ 306.657195] ------------[ cut here ]------------
> [ 306.657202] WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xa9/0xc0()
> [ 306.657204] Hardware name: ProLiant BL460c G6
> [ 306.657205] Modules linked in: nfsd lockd auth_rpcgss sunrpc tg3 libphy sg [last unloaded: x_tables]
> [ 306.657211] Pid: 3955, comm: perf Not tainted 2.6.39-rc4-00369-g23cf772-dirty #559
> [ 306.657212] Call Trace:
> [ 306.657214] <NMI> [<ffffffff8102ac39>] ? kmemcheck_fault+0xa9/0xc0
> [ 306.657221] [<ffffffff810427db>] warn_slowpath_common+0x8b/0xc0
> [ 306.657223] [<ffffffff81042825>] warn_slowpath_null+0x15/0x20
> [ 306.657226] [<ffffffff8102ac39>] kmemcheck_fault+0xa9/0xc0
> [ 306.657229] [<ffffffff8147ca4b>] do_page_fault+0x1fb/0x560
> [ 306.657234] [<ffffffff811d0289>] ? put_dec+0x59/0x60
> [ 306.657237] [<ffffffff811d0591>] ? number+0x301/0x330
> [ 306.657239] [<ffffffff8147a48f>] page_fault+0x1f/0x30
> [ 306.657245] [<ffffffff8124dce5>] ? vt_console_print+0x85/0x360
> [ 306.657247] [<ffffffff8124dcda>] ? vt_console_print+0x7a/0x360
> [ 306.657250] [<ffffffff81043159>] __call_console_drivers+0x89/0xa0
> [ 306.657252] [<ffffffff810431bb>] _call_console_drivers+0x4b/0x80
> [ 306.657254] [<ffffffff810432d7>] console_unlock+0xe7/0x1e0
> [ 306.657257] [<ffffffff8104388e>] vprintk+0x1ee/0x4a0
> [ 306.657260] [<ffffffff8102ac39>] ? kmemcheck_fault+0xa9/0xc0
> [ 306.657262] [<ffffffff81043ba7>] printk+0x67/0x70
> [ 306.657264] [<ffffffff8102ac39>] ? kmemcheck_fault+0xa9/0xc0
> [ 306.657267] [<ffffffff81042789>] warn_slowpath_common+0x39/0xc0
> [ 306.657269] [<ffffffff81042825>] warn_slowpath_null+0x15/0x20
> [ 306.657271] [<ffffffff8102ac39>] kmemcheck_fault+0xa9/0xc0
> [ 306.657273] [<ffffffff8147ca4b>] do_page_fault+0x1fb/0x560
> [ 306.657276] [<ffffffff8101167b>] ? intel_pmu_drain_bts_buffer+0x2b/0x170
> [ 306.657279] [<ffffffff8147a48f>] page_fault+0x1f/0x30
> [ 306.657282] [<ffffffff8100ef42>] ? x86_perf_event_update+0x12/0x70
> [ 306.657284] [<ffffffff810104b1>] ? intel_pmu_save_and_restart+0x11/0x20
> [ 306.657287] [<ffffffff81012e84>] intel_pmu_handle_irq+0x1d4/0x420
> [ 306.657290] [<ffffffff8147b570>] perf_event_nmi_handler+0x50/0xc0
> [ 306.657292] [<ffffffff8147cfa3>] notifier_call_chain+0x53/0x80
> [ 306.657294] [<ffffffff8147d018>] __atomic_notifier_call_chain+0x48/0x70
> [ 306.657296] [<ffffffff8147d051>] atomic_notifier_call_chain+0x11/0x20
> [ 306.657298] [<ffffffff8147d08e>] notify_die+0x2e/0x30
> [ 306.657300] [<ffffffff8147a8af>] do_nmi+0x4f/0x200
> [ 306.657302] [<ffffffff8147a6ea>] nmi+0x1a/0x20
> [ 306.657304] [<ffffffff8100fd4d>] ? intel_pmu_enable_all+0x9d/0x110
> [ 306.657305] <<EOE>> [<ffffffff810104da>] intel_pmu_nhm_enable_all+0x1a/0x120
> [ 306.657309] [<ffffffff810131d4>] x86_pmu_enable+0x104/0x260
> [ 306.657313] [<ffffffff810a84e9>] perf_pmu_enable+0x39/0x50
> [ 306.657314] [<ffffffff8101236c>] x86_pmu_add+0xac/0x120
> [ 306.657317] [<ffffffff810aae68>] ? perf_install_in_context+0x18/0xa0
> [ 306.657319] [<ffffffff8102b001>] ? kmemcheck_pte_lookup+0x11/0x40
> [ 306.657322] [<ffffffff8147a48f>] ? page_fault+0x1f/0x30
> [ 306.657325] [<ffffffff810acf15>] event_sched_in+0x65/0x110
> [ 306.657327] [<ffffffff810afb95>] __perf_install_in_context+0x125/0x140
> [ 306.657330] [<ffffffff810ab100>] ? perf_remove_from_context+0xa0/0xa0
> [ 306.657332] [<ffffffff810ab159>] remote_function+0x59/0x70
> [ 306.657335] [<ffffffff81075d6e>] smp_call_function_single+0x8e/0x170
> [ 306.657338] [<ffffffff810a86a4>] cpu_function_call+0x34/0x40
> [ 306.657340] [<ffffffff810afa70>] ? perf_tp_event+0xf0/0xf0
> [ 306.657342] [<ffffffff810aaedf>] perf_install_in_context+0x8f/0xa0
> [ 306.657345] [<ffffffff810b0792>] sys_perf_event_open+0x592/0x7a0
> [ 306.657348] [<ffffffff814819a9>] sysenter_dispatch+0x7/0x27
> [ 306.657350] ---[ end trace 7333dc2d81c31e96 ]---
That's just kmemcheck fault handler warning about in_nmi(). You could
try to make the relevant perf allocations use __GFP_NOTRACK and/or
SLAB_NOTRACK to avoid page faulting in the perf nmi handler.
Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/