Re: [PATCH 0/5] ftrace: to kill a daemon

From: Mathieu Desnoyers
Date: Fri Aug 08 2008 - 14:21:23 EST


* Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
>
> On Fri, 8 Aug 2008, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > >
> > > On Fri, 8 Aug 2008, Mathieu Desnoyers wrote:
> > > > * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > > > >
> > > > > I originally used jumps instead of nops, but unfortunately, they actually
> > > > > hurt performance more than adding nops. Ingo told me it was probably due
> > > > > to using up the jump predictions of the CPU.
> > > > >
> > > >
> > > > Hrm, are you sure you use a single 5-bytes nop instruction then, or do
> > > > you use a mix of various nop sizes (add_nops) on some architectures ?
> > >
> > > I use (for x86) what is in include/asm-x86/nops.h depending on what the
> > > cpuid gives us.
> > >
> >
> > That's bad :
> >
> > #define GENERIC_NOP5 GENERIC_NOP1 GENERIC_NOP4
> >
> > #define K8_NOP5 K8_NOP3 K8_NOP2
> >
> > #define K7_NOP5 K7_NOP4 ASM_NOP1
> >
> > So, when you try, later, to replace these instructions with a single
> > 5-bytes instruction, a preempted thread could iret in the middle of your
> > 5-bytes insn and cause an illegal instruction ?
>
> That's why I use kstop_machine.
>

kstop_machine does not guarantee that you won't have _any_ thread
preempted with IP pointing exactly in the middle of your instructions
_before_ the modification scheduled back in _after_ the modification and
thus causing an illegal instruction.

Still buggy. :/

> >
> >
> > > >
> > > > You can consume the branch prediction buffers for conditional branches,
> > > > but I doubt static jumps have this impact ? I don't see what "jump
> > > > predictions" you are referring to here exactly.
> > >
> > > I don't know the details, but we definitely saw a drop in preformance
> > > between using nops and static jumps.
> > >
> >
> > Generated by replacing all the call by 5-bytes jumps e9 00 00 00 00
> > instead of the 5-bytes add_nops ? On which architectures ?
> >
>
> I ran this on my Dell (intel Xeon), which IIRC did show the performance
> degration. I unfortunately don't have the time to redo those tests, but
> you are welcome to.
>
> Just look at arch/x86/kernel/ftrace.c and replace the nop with the jump.
> In fact, the comments in that file still say it is a jmp. Remember, my
> first go was to use the jmp.
>

I'll try to find time to compare :

multi-instructions 5-bytes nops (although this approach is just buggy)
5-bytes jump to the next address
2-bytes jump to offset +3.

Mathieu

> -- Steve
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/