I think PEBS is best supported by a generic abstraction. Something like this: it's basically a special sampling format, that generates a record of:I see, so that's how you'd return the data. How would a user specify that they want to use PEBS?
struct pt_regs regs;
__u64 insn_latency; /* optional */
__u64 data_address; /* optional */
this is pretty generic.
The raw CPU records have a CPU specific format, and they have to be demultiplexed anyway (on Nehalem, which can have up to four separate PEBS counters - but each output into the same DS area), so the lowlevel arch code converts the CPU record into the above generic sample record when it copies it into the mmap pages. It's a quick copy so no big deal performance-wise.
( Details:
- there might be some additional complications from sampling 32-bit contexts, but that too is a mostly low level detail that gets hidden.
- we might use a tiny bit more compact registers structure than
struct pt_regs. OTOH it's a well-known structure so it makes sense to standardize on it, even if the CPU doesnt sample all registers.
)
Can you see desirable PEBS-alike PMU features that cannot be expressed via such means?Power PMU's provide some fairly complex features, such as a thresholding mechanism which is used for marking instructions, and also there's an Instruction Matching CAM which can be used to mark only on certain instruction types. Since these features are present only on Power, I'm not sure it makes sense to go to the trouble of abstracting them for use on other arch/chip designs.
Ingo