Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats

From: Shailabh Nagar
Date: Thu Jun 29 2006 - 20:37:28 EST


jamal wrote:

On Thu, 2006-29-06 at 16:01 -0400, Shailabh Nagar wrote:



Jamal,
any thoughts on the flow control capabilities of netlink that apply here ? Usage of the connection is to supply statistics data to userspace.




if you want reliable delivery, then you cant just depend on async events
from the kernel -> user - which i am assuming is the way stats get
delivered as processes exit?

Yes.

Sorry, i dont remember the details. You
need some synchronous scheme to ask the kernel to do a "get" or "dump".


Oh, yes. Dump is synchronous. So it won't be useful unless we buffer task exit records within
taskstats.

Lets be clear about one thing:
The problem really has nothing to do with gen/netlink or any other
scheme you use;->
It has everything to do with reliability implications and the fact
that you need to assume memory is a finite resource - at one point
or another you will run out of memory ;-> And of course then messages
will be lost. So for gen/netlink, just make sure you have large socket
buffer and you would most likely be fine. I havent seen how the numbers were reached: But if you say you receive
14K exits/sec each of which is a 50B message, I would think a 1M socket
buffer would be plenty.


The rates (or upper bounds) that are being discussed here, as of now, are 1000 exits/sec/CPU for
1024 CPU systems. That would be roughly 1M exits/system * 248Bytes/message = 248 MB/sec.

You can find out about lack of memory in netlink when you get a ENOBUFS.
As an example, you should then do a kernel query. Clearly if you do a
query of that sort, you may not want to find obsolete info. Therefore,
as a suggestion, you may want to keep sequence numbers of sorts as
markers. Perhaps keep a 32-bit field which monotically increases per
process exit or use the pid as the sequence number etc..

As for throttling - Shailabh, I think we talked about this:
- You could maintain info using some thresholds and timer. Then
when a timer expires or threshold is exceeded send to user space.


Hmm. So we could buffer the per-task exit data within taskstats (the mem consumption would grow
but thats probably not a problem) and then send it out later.

Jay - would not getting exit data soon after exit be a problem for CSA ? I'm guessing not, if the
timeout is kept small enough. Internally, taskstats could always pace its sends so that "too much"
isn't sent out at one shot.

--Shailabh


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/