Re: [RFC] kmemcheck: TODO for stack tracking

From: Ingo Molnar
Date: Fri Nov 28 2008 - 06:29:57 EST



* Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:

> Hi,
>
> Here's a plan for how to do stack tracking with kmemcheck. It is not
> entirely trivial, but as far as I can see, it SHOULD be possible.
> Please let me know if you can spot any fallacies or other problems.
> I've probably missed something...
>
> /*
> * TODO for stack tracking in kmemcheck:
> *
> * 1. Make kernel run at CPL = 1
> *
> * This includes (I guess) changing the various privilege levels in most
> * system descriptors and descriptor tables, and probably the IOPL. Are there
> * any CPU features which always require CPL = 0 to work? Paging requires no
> * change, as the U/S flag distinguishes between CPL = 0, 1, 2 and CPL = 3
> * only.

hm, a ton of instructions need ring-0/supervisor privilege. OTOH, the
32-bit Xen hypervisor runs the guest kernel on ring 1 so all these places
are abstracted out to a fair degree already via paravirt_ops.

> *
> * 2. Modify TSS to use separate stacks for CPL = 0 and CPL = 1
> * 3. Install a Call Gate for Page Faults in the GDT with DPL = 0
> * 4. Change IDT entry for #PF to point to Call Gate in GDT
> *
> * Now when a #PF occurs in kernel mode, CPU will look up the IDT entry for
> * #PF. It points to our Call Gate in the GDT, which has a different privilege
> * level, so the CPU will look up the new stack to use in the TSS. In the new
> * stack, SS, ESP, CS, and EIP are saved. Note: page_fault() will have to take
> * care of handling the extra SS/ESP parameters. End of note. Observe that the
> * old stack has not been touched by the CPU at all (this would lead to a #DF,
> * Double Fault, which is irrecoverable). Observe also that none of the
> * interrupted task's registers have been modified. Now the CPU transfers
> * control to page_fault(), which must save all registers, etc. as usual.
> *
> * do_page_fault() must NOT be allowed to enable interrupts, otherwise we
> * could take interrupts that would use the new stack. If the interrupt
> * handler takes another page fault, the CPU will already be in CPL = 0 and no
> * stack switch will occur!
> *
> * I think we need to make the kernel switch stacks on ALL interrupts. When
> * the CPU is interrupted, it will attempt to push CS/EIP on the current
> * stack. If the PTE of the current stack is non-present, a Page Fault will be
> * generated (not a Double Fault!). However, we have no way to tell if the #PF
> * was generated by an interrupt.
> *
> * 5. Implement support for PUSHA/POPA instruction handling in kmemcheck. No
> * extra support will be needed for IRET, as interrupts must not be allowed
> * to occur when the stack is located in a non-present page.
> *
> * Note that it is possible to track POPF/IRET instructions (even though they
> * modify EFLAGS and the Trap Flag), because the CPU does the right thing and
> * raises the Debug Exception based on the previous setting of TF.
> *
> * 6. The kernel stack tracer would need to be modified to understand stack
> * changes/boundaries.
> */

Sounds like a lot of work.

I'm wondering, how about a non-fault-driven approach: for example the
function tracer could be modified to poison stack frames as we return
from a function, and it could also check the poison value when we enter a
function call.

This is a high-overhead approach too - but ftrace could be modified to
provide a stack frame size parameter so it would only involve the stack
frame that is entered/exited.

This would not have the same quality as kmemcheck, but would cover the
common cases to a fair degree. (and would also be fairly
false-positive-safe)

Hm?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/