Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats

From: Shailabh Nagar
Date: Fri Jun 30 2006 - 23:35:50 EST


Andrew Morton wrote:

On Fri, 30 Jun 2006 22:20:23 -0400
Shailabh Nagar <nagar@xxxxxxxxxxxxxx> wrote:



If we're going to abuse nl_pid then how about we design things so that
nl_pid is treated as two 16-bit words - one word is the start CPU and the
other word is the end cpu?

Or, if a 65536-CPU limit is too scary, make the bottom 8 bits of nl_pid be
the number of CPUS (ie: TASKSTATS_CPUS_PER_SET) and the top 24 bits is the
starting CPU.

<avoids mentioning nl_pad>

It'd be better to use a cpumask, of course..




All these options mean each listener gets to pick a "custom" range of cpus to listen on, rather than choose one of pre-defined ranges (even if the pre-defined ranges can change
by a configurable TASKSTATS_CPUS_PER_SET). Which means the kernel side has to
figure out which of the listeners cpu range includes the currently exiting task's cpu. To do
this, we'll need a callback from the binding of the netlink socket (so taskstats can maintain
the cpu -> nl_pid mappings at any exit).
The current genetlink interface doesn't have that kind of flexibility (though it can be added
I'm sure).

Seems a bit involved if the primary aim is to restrict the number of cpus that one listener
wants to listen, rather than be able to pick which ones.

A configurable range won't suffice ?




Set aside the implementation details and ask "what is a good design"?

A kernel-wide constant, whether determined at build-time or by a /proc poke
isn't a nice design.

Can we permit userspace to send in a netlink message describing a cpumask? That's back-compatible.


Yes, that should be doable. And passing in a cpumask is much better since we no longer
have to maintain mappings.

So the strawman is:
Listener bind()s to genetlink using its real pid.
Sends a separate "registration" message with cpumask to listen to. Kernel stores (real) pid and cpumask.
During task exit, kernel goes through each registered listener (small list) and decides which
one needs to get this exit data and calls a genetlink_unicast to each one that does need it.

If number of listeners is small, the lookups should be swift enough. If it grows large, we
can consider a fancier lookup (but there I go again, delving into implementation too early :-)


Sounds good to me !

--Shailabh







-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/