Re: [PATCH] Linux Kernel Markers 0.5 for Linux 2.6.17 (with probe management)

From: Ingo Molnar
Date: Fri Sep 22 2006 - 12:55:47 EST



* Mathieu Desnoyers <compudj@xxxxxxxxxxxxxxxxxx> wrote:

> Some details are worth to be mentioned :
>
> - The 5-NOP variant will imply a replacement of 5 1 bytes instructions
> with 1 5 bytes one, which is trickier. Masami Hiramatsu's proposal
> of 2 bytes near jump + 3 NOPS is nicer.

yeah.

> - Patching such a 5-bytes instruction memory region doesn't turn
> markers into a complete function call, which includes argument
> passing.

please read:

Message-ID: <20060921185029.GB12048@xxxxxxx>

about how to turn it into a complete function call / jump.

[ i came back to this point from the end of the mail, where you consider
this an "interesting idea" - so i guess your above claim is moot. If
it is not moot not then by all means i'm happy to further debate it. ]

> > And one reason why i want to avoid "static, build-time function call
> > based probing" is because high-performance computing users dont want
> > any overhead at all in the kernel fastpath.
>
> - The argument "most of the users wont use a particular feature"
> contradicts what you said earlier about every distribution wanting
> to enable a tracing mechanism for their users.

enabling a feature does not mean it's actually used by most users!
For example, Fedora currently enables:

CONFIG_NETPOLL=y

but only a tiny, tiny minority of users actually make use of it. Then
why is it still enabled? Because it has little to zero overhead to have
it enabled but not utilized. In the fastpath it has zero overhead. (and
yes, i originally designed and implemented netpoll/netconsole to be like
that, and i intentionally shaped it to be zero-overhead because i knew
it would be used as a debug feature.)

the same goes for tracing. We want to have it enabled in distros, but
only if it's near zero-cost. But because tracing wants to be there in
every fastpath, we have to work /hard/ to achieve this desired
end-result. I am "in your way" unfortunately /precisely/ because tracing
we want to enable in Fedora, but we want to pay as little cycles for it
on 99.998% of the Fedora boxes that wont be using any tracing at all.

> > > - High performance computing users won't want a breakpoint-based probe
> >
> > I am not forcing breakpoint-based probing, at all. I dont want _static,
> > build-time function call based_ probing, and there is a big difference.
> > And one reason why i want to avoid "static, build-time function call
> > based probing" is because high-performance computing users dont want any
> > overhead at all in the kernel fastpath.
> >
>
> I think that the performance benefits gained by using tracing
> information for studying a system makes the overhead of a jump in the
> kernel fast path insignificant. [...]

sorry, but as Alan mentioned it before, this leads to "death by a
million cuts". See above to understand what kind of feature is
acceptable to a distro (and hence to the upstream kernel) and what not.

> > > - djprobe is far away from being in an acceptable state on
> > > architectures with very inconvenient erratas (x86).
> >
> > djprobes over a NOP marker are perfectly usable and safe: just add a
> > simple constraint to them to only allow a djprobes insertion if it
> > replaces a 5-byte NOP.
> >
>
> 2 bytes jump + 3 bytes nops.. Yes, it should modify it without causing an
> illegal instruction, but how ? Are you aware that their approach has to :
> - put an int3
> - wait for _all_ the CPUs to execute this int3
> - then change the 5 bytes instruction
>
> I can think of a lot of cases where the CPUs will never execute this
> int3. Probably that sending an IPI or launching a kernel thread on
> each CPU to make sure that this int3 is executed could give more
> guarantees there. [...]

this is easy to solve, for example via the use of freeze_processes() and
thaw_processes().

> > > - kprobe and djprobe cannot access local variables in every cases
> >
> > it is possible with the marker mechanism i outlined before:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=115886524224874&w=2
> >
> > have i missed to address any concern of yours?
>
> Interesting idea. That would make it possible to probe local variables
> at the marker site. That's very good for use of kprobes on low rate
> debug-type markers, but that doesn't solve my concern about the
> cat-and-mouse race expressed earlier about live kernel polymorphic
> code.

i dont consider polymorphic code a problem at all. Java JITs (and the
dot-NET runtime) are faced with it every day and they have no problem
_dynamically rewriting high-performance code all the time_. In fact, in
the kernel we have /more/ tools to solve such problems. We can disable
interrupts, we can flush caches, we can send IRQs, we have total control
over every task in the system, etc., etc.

Furthermore, it is even more of a side issue in our case because unlike
Java JITs, for us the polymorphic code is in the /absolute slowpath/. We
could even turn off all caches while doing the code patching and nobody
would notice. (but such drastic measures are not needed at all)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/