Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
From: Mathieu Desnoyers
Date: Thu Apr 17 2008 - 20:06:09 EST
* Andi Kleen (andi@xxxxxxxxxxxxxx) wrote:
> Mathieu Desnoyers wrote:
> > * Jeremy Fitzhardinge (jeremy@xxxxxxxx) wrote:
> >> Mathieu Desnoyers wrote:
> >>> "This way lies madness. Don't go there."
> >>>
> >> It is a large amount of... stuff. This immediate values thing makes a big
> >> improvement then?
> >>
> >
> > As ingo said : the nmi-safe traps and exception is not only usefu lto
> > immediate values, but also to oprofile.
>
> How is it useful to oprofile?
>
oprofile hooks this in the nmi callbacks :
arch/x86/oprofile/nmi_timer_int.c: profile_timer_exceptions_notify()
calls
drivers/oprofile/oprofile_add_sample()
which calls oprofile_add_ext_sample()
where
if (log_sample(cpu_buf, pc, is_kernel, event))
oprofile_ops.backtrace(regs, backtrace_depth);
First, log_sample writes into the vmalloc'd cpu buffer. That's for one
possible page fault.
Then, is a kernel backtrace happen, then I am not sure if printk_address
won't try to read any of the module data, which is vmalloc'd.
> > On top of that, the LTTng kernel
> > tracer has to write into vmalloc'd memory, so it's required there too.
>
> All this effort changing really critical (and also fragile) code paths
> used all the time is to handle setting markers into NMI functions. Or
> actually the special case of setting markers in there that access
> vmalloc() without calling vmalloc_sync().
>
Isn't vmalloc_sync() an expensive operation ? That would imply doing a
vmalloc_sync() after loading modules and after each buffer allocation I
suppose. And it's also to be able to put a breakpoint there, for the
immediate values.
> NMI are maybe 5-6 functions all over the kernel.
>
> I just don't think it makes any sense to put markers in there.
> It is a really small part of the kernel the kernel that is unlikely
> to be really useful for anybody. You should rather first solve the
> problem of tracing the other 99.999999% of the kernel properly.
>
The fact is that NMIs are very useful and powerful when it comes to try
to understand where code disabling interrupts is stucked, to get
performance counter reads periodically without suffering from IRQ
latency. Also, when trying to figure out what is actually happening in
the kernel timekeeping, having a stable periodic time source can be
pretty useful. Hooking this kind of feature in a tracer seems rather
logical.
> And then you could actually set the markers in there if you're
> crazy enough, just call vmalloc_sync().
>
That would be one way to do it, except that it would not deal with int3.
Also, it would have to be taken into account at module load time. To me,
that looks like an error-prone design. If the problem is at the lower
end of the architecture, in the interrupt return path, why don't we
simply fix it there for good ?
> Mathieu argued earlier that markers should be set everywhere but
> that is also bogus because there is enough other code where
> you cannot set them either (one example would be early boot code[1])
>
hmmm ? :) There is no "init" function in marker.c. It depends on the rcu
mechanism though, so I guess we can instrument start_kernel only after
rcu_init(). And yes, boot code is one of the first thing embedded system
developers want to instrument.
> And to do anything in NMI context you cannot use any locks so you would
> have to write all data structures used by the markers lock less. I did
> that for the the new mce code, but it's a really painful and bug prone
> experience that I cannot really recommend to anybody.
>
LTTng is a lockless tracer which uses the RCU mechanism for control data
structure updates and a lockless cmpxchg_local scheme to manage the
per-cpu buffer space reservation. It has been out there for about 3
years now and is used in the industry.
> And then NMIs (and machine checks) are a really obscure case, very
> rarely used.
>
I wonder if they are used so rarely because the underlying kernel is
buggy with respect with NMIs or because they are useless.
> I think the right way is just to say that you cannot set markers
> into NMI and machine check. Even with this patch it is highly unlikely
> the resulting code will be correct anyways. Actually you could probably
> set them without the patch with some effort (like calling vmalloc_sync),
> but for the basic reasons mentioned above (lock less code is really
> hard, nmi type functions are less than hundred lines in the millions
> of kernel LOCs) it is just a very very bad idea.
>
You should have a look at LTTng then. ;) And by the way, the kernel
marker infrastructure also uses RCU-style updates and is designed to be
NMI-safe from the start.
Mathieu
> -Andi
>
>
> [1] Now that I mentioned it I still have enough faith to assume nobody
> will be crazy enough to come up with some horrible hack to set markers
> in early boot code too. But after seeing this patchkit ending up in a
> git tree I'm not sure.
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/