Re: [announce] Performance Counters for Linux, v6

From: Arnd Bergmann
Date: Thu Feb 26 2009 - 08:39:17 EST


On Thursday 26 February 2009, Paul Mackerras wrote:
> Arnd Bergmann writes:
> No, it's just an oversight, plus the fact that this stuff doesn't
> support the hardware counters on Cell, which I've heard is rather
> unusual. Don't you have to hcalls to access it on machines with a
> hypervisor? Also, is there any facility for counting events on the
> SPUs?

Yes, there is, and it's suported in both oprofile and perfmon2.
It uses a set of hooks in the spu scheduler to know about what
code got loaded into it, and the platform interface is a combination
of mmio and rtas/hcall accesses.

> > > +static const struct file_operations perf_fops = {
> > > + .release = perf_release,
> > > + .read = perf_read,
> > > + .poll = perf_poll,
> > > + .unlocked_ioctl = perf_ioctl,
> > > + .compat_ioctl = perf_ioctl,
> > > +};
> >
> > It feels inconsistent to combine a syscall for getting the fd with ioctl.
>
> Why? Once you have an fd, what's inconsistent about doing an ioctl on
> it?

I don't think we have any interface doing that today outside of the
traditional chardev, blockdev, socket and regular file descriptors.
The first inconsistency would be in having another type of object
that you can do ioctl on.

Back in the original spufs interface design, I had similar plans
for adding a combination of syscall and ioctl interfaces. The point
that convinced me to go with just syscall and regular file operations
was that syscall and ioctl are basically two interchangeable ways of
doing the same thing (adding an arbitrary function call into the kernel).
When you look in /proc/misc, half the stuff in there only exists as
a cheap way to sneak a syscall into the kernel without the necessary
review.

The inconsistency in combining the two would be the same as having
sys_timerfd_settime an ioctl on the fd from sys_timerfd_create,
or getdents an ioctl on a directory fd.

The most significant impact of allowing ioctl on the fd is certainly
that it allows adding many more interfaces on top of the performance
counters as soon as the code is initially merged. Whether you count
that as a an advantage or a disadvantage depends on your point
of view ;-).

> > Assuming the interface is semantically right, I'd suggest using either
> > a chardev with more ioctls or more syscalls but not both:
> >
> > asmlinkage long sys_perfcounter_fd(
> > struct perf_counter_hw_event *hw_event_uptr __user,
> > pid_t pid,
> > int cpu,
> > int group_fd);
> > asmlinkage long sys_perfcounter_setflags(int fd, u32 flags);
> >
> > or
> >
> > struct perf_counter_setup {
> > struct perf_counter_hw_event event;
> > __u64 pid;
> > __u32 cpu;
> > __u32 group_fd;
> > };
> > #define PERF_COUNTER_IOC_SETUP _IOW('$', 1, struct perf_counter_setup)
> > #define PERF_COUNTER_IOC_SETFLAGS _IOW('$', 2, __u32)
>
> I don't see any compelling value in either of those suggestions,
> sorry. Possibly, if it turns out that read() or ioctl() are just too
> slow (and can't be improved), we might consider adding syscalls.

I don't think that performance is an issue here, read() is quite well
optimized and ioctl() would only be used in the slowpath.

My second example is a bit constructed, if you wanted to go with a
chardev, that would probably imply larger interface changes than that.

One way to improve the interface over PERF_COUNTER_IOC_ENABLE might
be to use write() for enable/disable, but if sys_perfcounter_fd (or
whatever we call it) stays, using other syscalls for operating on
it seems to be the most consistant way of doing it.

Another point that now occurred to me is that in sys_perfcounter_fd(),
you should probably put the group descriptor argument first, to be
more consistent with the openat() family of syscalls which all take
the relative position as their first argument.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/