Re: [BUG/PATCH] x86 mmiotrace: dynamically disable non-boot CPUs

From: Pekka Paalanen
Date: Mon Apr 14 2008 - 14:03:10 EST


On Mon, 14 Apr 2008 08:57:13 +0200
Ingo Molnar <mingo@xxxxxxx> wrote:

> thanks Pekka, i've applied your patches.

Great, thanks.

> i'm wondering about this though:
>
> > Mmiotrace is not reliable with multiple CPUs and may miss events. Drop
> > to single CPU when mmiotrace is activated.
>
> we should fix this restriction ASAP. Forcibly dropping to UP will cause
> mmiotrace to be much less useful for diagnostic purposes of Linux

Ok, how do you propose we solve this?

I have asked the question before, and then I had two ideas. Well, the
first one was actually your idea (so I hear) to solve the same problem for
kmemcheck.
- per-cpu page tables
- instead of single-stepping, emulate the faulting instruction and never
disarm pages during tracing. (Use and modify code from KVM.)

I don't believe either of these is easy or fast to implement. Given some
months, I might be able to achieve emulation. Page tables are still
magic to me.

> drivers. We want to enable the mmiotrace-ing of specific devices via
> some /sys flag. For example via:
>
> cat /sys/devices/pci0000\:00/0000\:00\:1f.2/mmiotrace
>
> this should start mmiotracing of that specific device - or something
> like that. Hm?

Ooh, that sounds like a neat interface. I like it. I've been
half-thinking of different ways of specifying the set of devices to
trace, but none of them was particularly nice.
This feature is for post-2.6.26, right?

Ok, so first select mmiotrace tracer, at which point those sysfs files
appear, but mmiotrace is not activated yet. Then, when any of the
device specific files, or the global file in debugfs, is opened,
mmiotrace activates. And if the file is closed, mmiotrace deactivates.
Should we support several "cats" at the same time?

> > When I tested this patch on Intel Core 2 Duo, enter_uniprocessor()
> > triggered the following kernel bug:
> >
> > Linux version 2.6.25-rc8-sched-devel.git-x86-latest.git (paalanen@ct200006)
> > (gcc version 4.1.2 (Gentoo 4.1.2 p1.0.1)) #2 SMP PREEMPT Sun Apr 13
> > 22:09:03 EEST 2008
> > ...
> > in mmio_trace_init
> > mmiotrace: Disabling non-boot CPUs...
> > CPU 1 is now offline
> > lockdep: fixing up alternatives.
> > SMP alternatives: switching to UP code
> > BUG: sleeping function called from invalid context at mm/slab.c:3053
> > in_atomic():1, irqs_disabled():0
> > 5 locks held by bash/4423:
> > #0: (trace_types_lock){--..}, at: [<ffffffff8026442f>] tracing_set_trace_write+0x93/0x11a
> > #1: (mmiotrace_mutex){--..}, at: [<ffffffff802251e0>] enable_mmiotrace+0x17/0x142
> > #2: (cpu_add_remove_lock){--..}, at: [<ffffffff802580e5>] cpu_maps_update_begin+0x12/0x14
> > #3: (&cpu_hotplug.lock){--..}, at: [<ffffffff8025814f>] cpu_hotplug_begin+0x39/0x9f
> > #4: (smp_alt){--..}, at: [<ffffffff80211bd9>] alternatives_smp_switch+0x66/0x1ab
> > Pid: 4423, comm: bash Not tainted 2.6.25-rc8-sched-devel.git-x86-latest.git #2
> >
> > Call Trace:
> > [<ffffffff802520c0>] ? __debug_show_held_locks+0x22/0x24
> > [<ffffffff8022d292>] __might_sleep+0xd9/0xdb
> > [<ffffffff80287326>] cache_alloc_debugcheck_before+0x23/0x32
> > [<ffffffff80287a32>] __kmalloc+0x34/0xa5
> > [<ffffffff80209393>] ? clear_ti_thread_flag+0x10/0x17
> > [<ffffffff8027d7f6>] kmalloc_node+0x9/0xb
> > [<ffffffff8027d8e0>] __get_vm_area_node+0xa2/0x1cb
> > [<ffffffff80209393>] ? clear_ti_thread_flag+0x10/0x17
> > [<ffffffff8027da41>] __get_vm_area+0x13/0x15
> > [<ffffffff8027da60>] get_vm_area+0x1d/0x1f
> > [<ffffffff8027e14c>] vmap+0x2a/0x5c
> > [<ffffffff8021191b>] text_poke+0xaa/0x136
> > [<ffffffff804cba2b>] ? _etext+0x0/0x5
> > [<ffffffff802119f6>] alternatives_smp_unlock+0x4f/0x63
> > [<ffffffff80211ce1>] alternatives_smp_switch+0x16e/0x1ab
> > [<ffffffff8021b163>] __cpu_die+0x53/0x7d
> > [<ffffffff802583e2>] _cpu_down+0x195/0x26c
> > [<ffffffff802585ca>] cpu_down+0x26/0x36
> > [<ffffffff80225270>] enable_mmiotrace+0xa7/0x142
> > [<ffffffff80266b8d>] mmio_trace_init+0x3c/0x40
> > [<ffffffff8026448e>] tracing_set_trace_write+0xf2/0x11a
> > [<ffffffff80327fac>] ? security_file_permission+0x11/0x13
> > [<ffffffff8028b047>] vfs_write+0xa7/0xe1
> > [<ffffffff8028b13b>] sys_write+0x47/0x6d
> > [<ffffffff8020b4db>] system_call_after_swapgs+0x7b/0x80
> >
> > mmiotrace: CPU1 is down.
> > mmiotrace: enabled.
> >
> > Is this my fault, or is there a bug somewhere else? The kernel tree is
> > sched-devel/latest git from 12th April, IIRC.
>
> there's no known bug of sched-devel/latest of this kind (or any known
> bug for that matter).
>
> i suspect the bug is that you bring the CPU down from an atomic
> (spinlocked or irq disabled) context.

Hmm, it should not be... I have to double-check, but all the other code,
too, from where enter_uniprocessor() is called, may sleep. The first thing
the caller does is to acquire a mutex, which I assume would complain
loudly if spinlocked or irq-disabled.

Ingo, thank you for fixing this patch, though I'd like to suggest to
leave it out for now, since there clearly are worse problems with it
than without it. And if we can solve the SMP issue, this is not needed.
For the time being we can just instruct users to disable all but one CPU
when try want to trace.


Thanks.

--
Pekka Paalanen
http://www.iki.fi/pq/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/