Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats
From: Shailabh Nagar
Date: Wed Jul 05 2006 - 10:08:53 EST
jamal wrote:
> Shailabh,
>
> On Tue, 2006-04-07 at 12:37 -0400, Shailabh Nagar wrote:
> [..]
>
>>Here's a strawman for the problem we're trying to solve: get
>>notification of the close of a NETLINK_GENERIC socket that had
>>been used to register interest for some cpus within taskstats.
>>
>> From looking at the netlink code, the way to go seems to be
>>
>>- it maintains a pidhash of nl_pids that are currently
>>registered to listen to atleast one cpu. It also stores the
>>cpumask used.
>>- taskstats registers a notifier block within netlink_chain
>>and receives a callback on the NETLINK_URELEASE event, similar
>>to drivers/scsci/scsi_transport_iscsi.c: iscsi_rcv_nl_event()
>>
>>- the callback checks to see that the protocol is NETLINK_GENERIC
>>and that the nl_pid for the socket is in taskstat's pidhash. If so, it
>>does a cleanup using the stored cpumask and releases the nl_pid
>>from the pidhash.
>>
>
>
> Sound quiet reasonable. I am beginning to wonder whether we should do
> do the NETLINK_URELEASE in general for NETLINK_GENERIC
I'd initially thought that might be useful but since NETLINK_GENERIC
is only "virtually" multiplexing the sockfd amongst each of its users,
I don't know what benefits a generic notifier at NETLINK_GENERIC layer
would bring (as opposed to each NETLINK_GENERIC user directly registering
its callback with netlink). Perhaps simplicity ?
>>We can even do away with the deregister command altogether and
>>simply rely on this autocleanup.
>
>
> I think if you may still need the register if you are going to allow
> multiple sockets per listener process, no?
The register command, yes. But an explicit deregister, as opposed to
auto cleanup on fd close, may not be used all that much :-)
> The other question is how do you correlate pid -> fd?
For the notifier callback, I thought netlink_release will
provide the nl_pid correspoding to the fd being closed ?
I can just do a search for that nl_pid in the taskstats-private pidhash.
The nl_pid gets into the pidhash using the genl_info->pid field
when the listener issues the register command.
Will that be correct ?
So here's the sequence of pids being used/hashed etc. Please let
me know if my assumptions are correct ?
1. Same listener thread opens 2 sockets
On sockfd1, does a bind() using
sockaddr_nl.nl_pid = my_pid1
On sockfd2, does a bind() using
sockaddr_nl.nl_pid = my_pid2
(one of my_pid1's could by its process pid but doesn't have to be)
2. Listener supplies cpumasks on each of the sockets through a
register command sent on sockfd1.
In the kernel, when the command is received,
the genl_info->pid field contains my_pid1
my_pid1 is stored in a pidhash alongwith the corresponding cpumask.
cpumask is used to store the my_pid1 into per-cpu lists for each
cpu in the mask.
3. When an exit event happens on one of those cpus in the mask,
it is sent to this listener using
genlmsg_unicast(...., my_pid1)
4. When the listener closes sockfd1, netlink_release() gets called
and that calls a taskstats notifier callback (say taskstats_cb) with
struct netlink_notify n =
{ .protocol = NETLINK_GENERIC, .pid = my_pid1 }
and using the .pid within, taskstats_cb can do a lookup within its
pidhash. If its present, use the cpumask stored alongside to go
clean up my_pid1 stored in the listener list of each cpu in the mask.
--Shailabh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/