Re: [patch 23/24] perfmon: kernel documentation

From: stephane eranian
Date: Wed Nov 26 2008 - 13:22:16 EST


Andi,

On Wed, Nov 26, 2008 at 1:21 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> On Wed, Nov 26, 2008 at 12:43:00AM -0800, eranian@xxxxxxxxxxxxxx wrote:
>
> I assume you'll be also submitting manpages with the same information?
>
This is on my TODO list. Provide a man page for each new syscall.

>> +
>> + A monitoring session is uniquely identified by a file descriptor obtained
>> + when the session is created. File sharing semantics apply to access the
>> + session inside a process. A session is never inherited across fork. The file
>> + descriptor can be used to receive counter overflow notifications or when the
>> + sampling buffer is full. It is possible to use poll/select on the descriptor
>> + to wait for notifications from multiple sessions. Similarly, the descriptor
>> + supports asynchronous notifications via SIGIO.
>
> What happens when the fd is passed between processes using unix sockets fd
> passing?
>

I have never played with that myself, even with regular file
descriptors. But I can only
assume passing a file descriptor increments its refcount. Thus you
simply get another
controlling process. There is enough context locking in place in the
kernel to make this
work.


>> +
>> + We have released a simple monitoring tool to demonstrate the features of
>> + the interface. The tool is called pfmon and it comes with a simple helper
>> + library called libpfm. The library comes with a set of examples to show
>
> I don't think "simple" is the right word to describe pfmon/libpfm @)
>
The idea is simple, implementation is more complicated. Complexity of libpfm
comes mostly from complexity of the hardware, take Cray, Power, Pentium4 and
Itanium2 for instance ;->

>> + There maybe other tools available for perfmon.
>
> s/maybe/are/ ?
>
>> +
>> + To destroy a session, the regular close() system call is used.
>
There are tools.

> ...
>
> Some simple syscall examples would be nice. e.g. how to set up a counter
> that it can be accessed using RDPMC on x86.

I can add this. But why go straight to RDPMC. Most people would want to use
the syscall instead?

>
>> + /sys/kernel/perfmon/arg_mem_max(read-write):
>> +
>> + Maximum size of vector arguments expressed in bytes.
>> + It can be modified but must be at least a page.
>> + Default: PAGE_SIZE
>
> Is there any good reason ever to enlarge this beyond a page?
>
> If it just depends on future hardware it would make more sense
> to let a driver patch for that adjust it.
>
It depends on the number of registers available. It is expected that most tools
will want to use one call to program the config registers and one to program
the data registers. Pfmon is able to split vectors according to arg_mem_max.

It is anticipated that newer processors will increase the number of available
PMU registers. That was the case with Barcelona with the addition of IBS.
On Intel X86, I am planning on exposing the LBR as part of the PMU registers.

On Itanium, you already have 35 data and 27 config registers.

But I think your suggestion is interesting. When we "register" the new PMU
mapping table, we can provide a minimal size to fit all PMC or all PMD registers
in one call. That would remove a control point for the sysadmin, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/