Re: [RFC][PATCH 0/3]Djprobe (Direct Jump Probe) for 2.6.14-rc5-mm1
From: Hiro Yoshioka
Date: Mon Dec 12 2005 - 06:15:04 EST
Hi,
HTML format has been rejected so I'd like to resend this mail again.
I was a lucky enough to attend Hiramatsu san's presentation at a
kernel reading party at YLUG (Yokohama Linux Users Group)
http://ylug.jp/
It is a really cool idea and I like it :-)
Regards,
Hiro
On 10/31/05, Masami Hiramatsu <hiramatu@xxxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> I would like to propose djprobe (Direct Jump Probe) for low overhead
> probing.
> The djprobe is useful for the performance analysis function and the
> kernel flight-recording function which constantly traces events in
> the kernel. Because we should make their influence on performance as
> small as possible.
> Djprobe is a kind of probes in kernel like kprobes.
> It has some features:
> - Jump instruction based probe. This is so fast.
> - Non interruption.
> - Safely code insertion on SMP.
> - Lockless probe after registered.
> I attached detailed document of djprobe to this mail. If you need
> more information, please see it.
>
> This djprobe is NOT a replacement of kprobes. Djprobe and kprobes
> have complementary qualities. (ex: djprobe's overhead is low, and
> kprobes can be inserted in anywhere.)
> You can use both kprobes and djprobe as the situation demands.
>
> I measured the overhead of the djprobe on Pentium4 3.06GHz PC by
> using gtodbench (*). The result I got was about 100ns. In the view
> of performance, I think djprobe is the best probe method. What would
> you think about this?
>
> (*)The gtodbench is micro benchmark which is included in published
> djprobe source package. You can download it from LKST's web site:
> http://prdownloads.sourceforge.net/lkst/djprobe-20050713.tar.bz2
>
> The following three patches introduce djprobe (Direct Jump Probe)
> to linux-2.6.14-rc5-mm1.
> patch 1: Introduce a instruction slot management structure to
> handle different size slots. (a patch for kprobes)
> patch 2: Djprobe core (arch-independant) patch.
> patch 3: Djprobe i386 (arch-dependant) patch.
>
> Please try to use djprobe.
>
> Any comments or suggestions are welcome.
>
> Best regards,
>
> --
> Masami HIRAMATSU
> 2nd Research Dept.
> Hitachi, Ltd., Systems Development Laboratory
> E-mail: hiramatu@xxxxxxxxxxxxxxxxx
>
>
> Djprobe Documentation
> authors: Satoshi Oshima (soshima@xxxxxxxxxx)
> Masami Hiramatsu (hiramatu@xxxxxxxxxxxxxxxxx)
>
> INDEX
>
> 1. Djprobe concepts
> 2. How djprobe works
> 3. Further Considerations
> 4. Djprobe Features
> 5. Architectures Supported
> 6. Configuring Djprobe
> 7. API Reference
> 8. TODO
> 9. FAQ
>
> 1. Djprobe concepts
>
> The basic idea of Djprobe is to dynamically hook at any kernel function
> entry points and collect the debugging or performance analysis information
> non-disruptively. The functionality of djprobe is very similar to Kprobe
> or Jprobe. The distinction of djprobe is to use jump instruction instead
> of break point instruction. This distinction reduces the overhead of each
> probe.
>
> Developers can trap at almost any kernel function entry points, specifying
> a handler routine to be invoked when the jump instruction is executed.
>
>
> 2. How Djprobe works
>
> Break point instruction is easily inserted on most architecture.
> For example, binary size of break point instruction on i386 or x86_64
> architecture is 1 byte. 1 byte replacement is took place in single step.
> And replacement with breakpoint instruction is guaranteed as SMP safe.
>
> On the other hand jump instruction is not easily inserted. Binary size of
> jump instruction on i386 is 5 byte. 5 byte replacement cannot be executed
> in single step. And beyond that dynamic code modification has some
> complicated restriction.
>
> To explain the djprobe mechanism, we introduce some terminology.
> Image certain binary line which is constructed by 2 byte instruction,
> 2byte instruction and 3byte instruction.
>
> IA
> |
> [-2][-1][0][1][2][3][4][5][6][7]
> [ins1][ins2][ ins3 ]
> [<- DCR ->]
> [<- JTPR ->]
>
> ins1: 1st Instruction
> ins2: 2nd Instruction
> ins3: 3rd Instruction
> IA: Insertion Address
> JTPR: Jump Target Prohibition Region
> DCR: Detoured Code Region
>
>
> The replacement procedure of djpopbes is 6 steps:
>
> (1) copying instruction(s) in DCR
> (2) putting break point instruction at IA
> (3) scheduling works on each CPU
> (4) executing CPU safety check on each work
> (5) replacing original instruction(s) with jump instruction without
> first byte and serializing code
> (6) replacing break point instruction with first byte of jump instruction
>
> Further explanation is given below.
>
> (1) copying instruction(s) in DCR
>
> Djprobe copies replaced instruction(s) to the region that djprobe allocates.
> The replaced instructions must include the instruction that includes the byte
> at IA+4. Therefore the size of DCR must be 5 byte or more. The size of DCR
> must be given by djprobes user.
>
> (2) putting break point instruction at IA
>
> Djprobe replaces a break point instruction at Insertion Point. After this
> replacement, the djprobe act like kprobes.
>
> (3) scheduling works on each CPU
>
> Djprobe schedules work(s) that execute CPU safety check on each CPU, and wait
> till those works finished.
>
> (4) executing CPU safety check on each work
>
> Current Djprobe suppose that the context switch must NOT occur on extension
> of interruption, which means that every interruption must return before
> executing context switch.
>
> Therefore, execution of scheduled works itself is the proof that every
> interruption stack (and every process stack) doesn't include any address
> in JTPR.
>
> The last CPU that executes safety check work wakes the waiting process up.
>
> (5) replacing original instruction(s) with jump instruction without first
> byte and serializing code
>
> After all safety check works are scheduled, djprobe can replace the codes
> in JTPR safely. Because, now, any CPU is not executing JTPR. Even if a CPU
> tries to execute the instructions in the top of DCR again, the CPU is
> interrupted by kprobe and is led to execute the copied instructions. So
> any CPU does not touch the instructions in the JTPR.
>
> Djprobe replaces the bytes in the area from IA+1 to IA+4 with jump
> instruction that doesn't contain first byte.
> And it serializes the replaced code on every CPU.
>
> (6) replacing breakpoint instruction with first byte of jump instruction
>
> Djprobe replaces breakpoint instruction that is put by themselves with the
> first byte of jump instruction.
>
>
> 3. Further considerations
>
> There are many difficulties on implementation of djprobe. In this section,
> we discuss restrictions of djprobe to understand these difficulties.
>
> 3.1 the way to confirm safety of DCR(dynamic analysis)
>
> Djprobe tries to replace the code that includes one instruction or more.
> This replacement usually accompanies changing the boundaries of instructions.
> Therefore djprobe must ensure that the other CPUs don't execute DCR or every
> stack doesn't contain the address in JTPR.
>
> 3.2 confirmation of safety of DCR(static analysis)
>
> Djprobe must also avoid JTPR must not be targeted by any jump or call
> instruction. Basically this must be extremely difficult to take place.
> But some point such as function entry point can be expected that is not
> target of jump or call instruction (because function entry point contains
> fixed form that ensures the code convention.)
>
> 4. Djprobe Features
>
> - Djprobe can probe entries of almost all functions without any interruption.
>
> 5. Architecture Supported
>
> - i386
>
>
> 6. Configuring Djprobe
> When configuring the kernel using make menuconfig/xconfig/oldconfig, ensure
> that CONFIG_DJPROBE is set to "y". Under "Instrumentation Support",
> look for "Direct Jump probe". You may have to enable "Kprobes" and to
> *DISABLE* "Preemptible Kernel".
>
> 7. API Reference
> The Djprobe API includes "register_djprobe" function and
> "unregister_djprobe" function. Here are specifications for these functions
> and the associated probe handlers.
>
> 7.1 register_djprobe
>
> #include <linux/djprobe.h>
> int register_djprobe(struct djprobe *djp, void *addr, int size);
>
> Inserts a jump instruction at the address addr. When the jump is
> hit, Djprobe calls djp->handler.
>
> register_djprobe() returns 0 on success, or a negative errno otherwise.
>
> User's probe handler (djp->handler):
> #include <linux/djprobe.h>
> #include <linux/ptrace.h>
> void handler(struct djprobe *djp, struct pt_regs *regs);
>
> Called with p pointing to the djprobe associated with the probe point,
> and regs pointing to the struct containing the registers saved when
> the probe point was hit.
>
> 7.2 unregister_djprobe
>
> #include <linux/djprobe.h>
> void unregister_djprobe(struct djprobe *djp);
>
> Removes the specified probe. The unregister function can be called
> at any time after the probe has been registered.
>
>
> 8. TODO
>
> (1)support architecture transparent interface.
> (Djprobe interface emulated by kprobes)
> (2)bulk registeration interface support
> (3)kprobe interoperability (coexistance in same address)
> (4)other architectures support
>
> 9. FAQ
> Direct Jump Probe Q&A
>
> Q: What is the Direct Jump Probe (Djprobe)?
> A: Djprobe is a low overhead probe method for linux kernel.
>
> Q: What is different from Kprobes?
> A: The most different feature is that the djprobe uses a jump instruction
> code instead of breakpoint instruction code. It can reduce overheads of
> probing especially when the probes are executed frequently.
>
> Q: How does the djprobe work?
> A: First, Djprobe copies some instructions modified by a jump instruction
> into the middle of a stub code buffer. Next, it overwrites the instructions
> with the jump instruction whose destination is the top of that stub code
> buffer. In the top of the stub code buffer, there is a call instruction
> which calls a probe function. And, in the bottom of the stub code buffer,
> there is a jump instruction whose destination is the next of the modified
> instructions.
> On the other hand, Kprobe copies only one instruction which will be
> modified by breakpoint instruction, and overwrites it breakpoint
> instruction. When breakpoint interruption handling, it executes the copied
> instruction with the trap flag. When trap interruption handling, it
> corrects IP(*) for returning to the kernel code.
> So, djprobe's work sequence is "jump", "probe", "execute copies" and
> "jump", whereas kprobes' sequence is "break", "probe", "execute copies",
> and "trap".
>
> (*)Instruction Pointer
>
> Q: Does the djprobe need to modify kernel source code?
> A: No. The djprobe is one of the dynamic probes. It can be inserted into
> running kernel.
>
> Q: Can djprobe work with CPU-hotplug?
> A: Yes, djprobe locks cpu-hotplug in the critical section.
>
> Q: Where can the djprobe be inserted in?
> A: Djprobe can be inserted in almost all kernel code including the head of
> almost kernel functions. The insertion area must satisfy the assumptions
> described below.
>
> (In i386 architecture)
> IA
> |
> [-2][-1][0][1][2][3][4][5][6][7]
> [ins1][ins2][ ins3 ]
> [<- DCR ->]
> [<- JTPR ->]
>
> ins1: 1st Instruction
> ins2: 2nd Instruction
> ins3: 3rd Instruction
> IA: Insertion Address
> DCR (Detoured Code Region): The area which is including the instructions
> whose first byte is in the range in 5 bytes (this size is from the size of
> jump instruction) from the insertion address. These instructions are copied
> into the middle of a stub code buffer.
> JTPR (Jump Target Prohibition Region): The area which is including the
> codes among codes rewritten in the jump instruction by djprobe except the
> first one byte.
>
> Assumptions:
> i) The insertion address points the first byte of an instruction.
> This is for avoidance of a bad instruction exception.
> ii) There are no instructions which refer IP (ex. relative jmp) in DCR.
> EIP has been changed when copied instruction is executed.
> iii) There are no instructions which occur context-switch (ex. call
> schedule()) in DCR.
> If a context-switch occurs in DCR, the next address of an instruction
> (ex. the address of "ins2") is stored in the call stack of previous thread.
> After that, djprobe overwrites the instruction with jump instruction. When
> the previous thread switches back, it resumes execution from the stored
> address. So it will cause a bad instruction exception.
> iv) Destination address of jump or call is not included in JTPR.
> This is for avoidance of a bad instruction exception too.
>
> Q: Can several djprobes be inserted in the same address?
> A: Yes. Several djprobes which are inserted in the same address are
> aggregated and share one instance.
> NOTE: When a new djprobe's insertion address is in another djprobe's JTPR
> (above described), or the another djprobe's insertion address is in the new
> djprobe's JTPR, register_djprobe() fails to register the new djprobe and
> returns -EEXIST error code.
>
> Q: Can djprobe be used with kprobes in same address?
> A: No, currently djprobe can not coexist with kprobes in same address. But
> we will support this feature as soon as possible.
>
> Q: Should the jump instruction be with in a page boundary to avoid access
> violation and page fault?
> A: No. The x86 processors can handle non-aligned instructions correctly. We
> can see many non-aligned instructions in the kernel binary. And, in the
> kernel space, there is no page fault. Kernel code pages are always mapped
> to the kernel page table.
> So it is not necessary to care of page boundaries in x86 architecture.
>
> Q: How does the djprobe resolve problems about self/cross-modifying code?
> In Pentium Series, Unsynchronized cross-modifying code operations except
> the first byte of an instruction can cause unexpected instruction
> execution results.
> A: Djprobe uses a trick code to resolve the problems. It modifies the
> instructions as following.
> 1) Register special handler as a kprobe handler. (And a break point
> instruction is written on the first byte of the insertion address by
> kprobes.)
> 2) Check safety (this is described in the next question's answer).
> 3) Write only the destination address part of jump instruction on the
> kernel code. (This operation is not synchronized)
> 4) Call "cpuid" on each processor for synchronization.
> 5) Write the first byte of the jump instruction. (This operation is
> synchronized automatically)
>
> Q: How does the djprobe guarantee no threads and no processors are
> executing the modifying area? The IP of that area may be stored in the
> stack memory of those threads.
> A: The problem would be caused for three reasons:
> i) Problem caused by the multi processor system
> Another processor may be executing the area which is overwritten by jump
> instruction. Djprobe should guarantee no processor is executing those
> instructions when modify it.
> ii) Problem caused by the interruption
> An interruption might have occurred in the area which is going to be
> overwritten by jump instruction. Djprobe should guarantee all
> interruptions which occurred in the area have finished.
> iii) Problem caused by full preempt kernel
> In case of Problem (iii), it is described in the next question's answer.
>
> The Djprobe uses the workqueue to resolve Problem (i) and (ii). The
> solution is described below:
> 1) Copy the entire of the DCR (described above) into the middle of a stub
> code buffer.
> 2) Register special handler as a kprobe handler. This special handler
> changes kprobe's resume point to the stub code buffer.
> 3) Clear the safety flags of all processors.
> 4) Register a safety checking work to the workqueue on each processor. And
> wait till those works are scheduled.
> 5) When keventd thread is scheduled on a processor, it executes the work.
> In this time, this processor is not executing the area which is
> overwritten by jump instruction. And also it has finished all
> interruptions. Because, in the case of voluntary preemption or non
> preemption kernel, the context switch does not occur in the extension of
> interruption.
> 6) The all works are scheduled, djprobe writes the jump instruction.
>
> Q: Can the djprobe work with kernel full preemption?
> A: No, but you can use the djprobe's interface. When kernel full preemption
> is enabled, we can't ensure that no threads are executing the modified
> area. It may be stored in the stack of the threads. In this case, the
> djprobe interfaces are emulated by using kprobe.
> The latest linux kernel supports not only full preemption but also the
> voluntarily preemption. In the case of voluntarily preemption, threads
> are scheduled from only limited addresses. So it is easy to check that
> the preemption can not occur in the modified area.
>
>
>
>
--
Hiro Yoshioka
mailto:hyoshiok at miraclelinux.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/