Re: [RFC] perf: need to expose sched_clock to correlate usersamples with kernel samples

From: Pawel Moll
Date: Mon Apr 08 2013 - 13:58:26 EST

On Sat, 2013-04-06 at 12:05 +0100, Richard Cochran wrote:
> On Fri, Apr 05, 2013 at 07:16:53PM +0100, Pawel Moll wrote:
> > Ok, so how about the code below? Disclaimer: this is just a proposal.
> > I'm not sure how welcomed would be an extra field in struct file, but
> > this makes the clocks ultimately flexible - one can "attach" the clock
> > to any arbitrary struct file. Alternatively we could mark a "clocked"
> > file with a special flag in f_mode and have some kind of lookup.
> Only a tiny minority of file instances will want to be clocks.
> Therefore I think adding the extra field will be a hard sell.
> The flag idea sounds harmless, but how do you perform the lookup?

Hash table. I'll get some code typed and post it tomorrow.

> > Also, I can't stop thinking that the posix-clock.c shouldn't actually do
> > anything about the character device... The PTP core (as the model of
> > using character device seems to me just one of possible choices) could
> > do this on its own and have simple open/release attaching/detaching the
> > clock. This would remove a lot of "generic dev" code in the
> > posix-clock.c and all the optional cdev methods in struct posix_clock.
> > It's just a thought, though...
> Right, the chardev could be pushed into the PHC layer. The original
> idea of chardev clocks did have precedents, though, like hpet and rtc.

I'm not arguing about the use of cdev for PTP clocks, it's perfectly
fine with me. I'm just not convinced that the "more generic" clock layer
should enforce cdevs and nothing more. But I think we're more-or-less
talking the same language here, so I'll simply create a patch and send
it as RFC.

> > And a couple of questions to Richard... Isn't the kref_put() in
> > posix_clock_unregister() a bug? I'm not 100% but it looks like a simple
> > register->unregister sequence was making the ref count == -1, so the
> > delete_clock() won't be called.
> Well,
> posix_clock_register() -> kref_init() ->
> atomic_set(&kref->refcount, 1);
> So refcount is now 1 ...
> posix_clock_unregister() -> kref_put() -> kref_sub(count=1) ->
> atomic_sub_and_test((int) count, &kref->refcount)
> and refcount is now 0. Can't see how you would get -1 here.

Eh. For some reason I was convinced that kref_init() sets the counter to
0 not 1. My bad.

> > And was there any particular reason that the ops in struct
> > posix_clock are *not* a pointer?
> One less run time indirection maybe? I don't really remember why or
> how we arrived at this. The whole PHC review took a year, with
> something like fifteen revisions.

Ok. As most of the *_ops seem to be referenced via pointers (including
file ops, which are pretty heavily used ;-) and this makes it much
easier to define static clocks, I'll propose a change in a separate

Now, before I spend time doing all this, a question to John, Peter,
Stephane and the rest of the public - would a solution providing such
userspace interface:

fd = sys_perf_open()
timestamp = clock_gettime((FD_TO_CLOCKID(fd), &ts)

be acceptable to all?


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at