Re: ptrace API extensions for BTS
From: Roland McGrath
Date: Wed Jan 30 2008 - 02:26:19 EST
Sorry I did not get more into this discussion earlier. I still have not
read through all of the email threads. But I have looked over the current
version of your code now in -mm.
I think this work has a great deal of overlap with the perfmon2 project.
There are two facets that overlap, which together are the whole BTS feature.
The same x86 "debug store" hardware is programmed for both the BTS and the
performance monitoring features. The implementations clearly have to
cooperate on managing that hardware. Your ds.c is a start in the right
direction, to abstract the hardware configuration from the BTS feature and
its interface. I'm not familiar with the perfmon2 code, but it may have
something similar already.
The rest of the BTS feature is the buffer management and the interface.
It has to deal with the hardware buffer setup, context switching, and
overflow interrupts, and delivering data from the hardware buffers to the
interface in appropriate formats. We'd also like it to be able to trace
kernel-mode as well as user-mode, and either deliver combined data or
segregate the data between the two for user-space and kernel-space users
who need not know about each other's tracing. (On some of the hardware
you can program it to record only one or the other (X86_FEATURE_DSCPL).
On older hardware, or when separately tracing both, you can trace both
and then distinguish each sample by its from_ip.) perfmon2 also wants to
address all of that.
I don't much like the way you've shoe-horned the context-switch timestamp
logging into the BTS feature. It's a nice feature to have in some form,
and I sympathize with your seeing it as easy pickings once you had the
BTS buffer machinery handy. But really it is not part of the BTS feature
and there is nothing arch-dependent about it. Given some other general
thing providing the buffer management et al, that could just be done in
schedule(), i.e.:
departs(prev);
context_switch(rq, prev, next); /* unlocks the rq */
arrives(prev);
If there is a general thing for event-reporting from perfmon2 or
whatever, then it might be natural to have the context-switch event
reports configurable to different record formats you might be using
for other things. For a BTS-style record, I would use:
departs: { .from = task_pt_regs(prev)->ip, .to = jiffies, .flag = MAGIC1 }
arrives: { .from = jiffies, .to = task_pt_regs(next)->ip, .flag = MAGIC2 }
MAGIC1 = 0xffff0001
MAGIC2 = 0xffff0002
or something like that, i.e. as if it were a "branch to block-time" and a
"branch from wake-time". (Actually you might want MAGIC3 and MAGIC4 too,
for whether it was a voluntary or involuntary context switch.) For
different use that is doing mostly other event sampling rather than BTS,
it might use a different format that gives more register into a la PEBS.
I'm no expert on perfmon2 and I understand there are many issues to be
resolved to get it into the kernel. But if you are not desperate to have
the BTS feature in the kernel ASAP, it would ideal IMHO if you can work
with Stephane et al on putting this work together. I'd like to see the
work go into the kernel in much smaller pieces even than your BTS patch
set that's in -mm. The first thing is just the DS hardware management,
context switch and hardware-facing parts of the buffer management (one or
three or fourth small bisect-friendly patches just for that much). If
you and Stephane can hash out a fresh patch that provides what you both
need for that, that would be a great start. Personally, I'd prefer to
abandon the ptrace extensions altogether in favor of some generalized
event buffer interface that comes from merging perfmon2. But if you
still want to do the ptrace interface, it can be built on the shared
DS-management code. What do you think?
Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/