Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats

From: Shailabh Nagar
Date: Thu Jun 29 2006 - 21:10:01 EST


Andrew Morton wrote:

Shailabh Nagar <nagar@xxxxxxxxxxxxxx> wrote:


The rates (or upper bounds) that are being discussed here, as of now, are 1000 exits/sec/CPU for
1024 CPU systems. That would be roughly 1M exits/system * 248Bytes/message = 248 MB/sec.



I think it's worth differentiating between burst rates and sustained rates
here.

One could easily imagine 10,000 threads all exiting at once, and the user
being interested in reliably collecting the results.

But if the machine is _sustaining_ such a high rate then that means that
these exiting tasks all have a teeny runtime and the user isn't going to be
interested in the per-thread statistics.

So if we can detect the silly sustained-high-exit-rate scenario then it
seems to me quite legitimate to do some aggressive data reduction on that. Like, a single message which says "20,000 sub-millisecond-runtime tasks
exited in the past second" or something.


The "buffering within taskstats" might be a way out then.
As long as the user is willing to pay the price in terms of memory, we can collect the exiting task's
taskstats data but not send it immediately (taskstats_cache would grow) unless a high water mark had
been crossed. Otherwise a timer event would do the sends of accumalated taskstats (not all at once but
iteratively if necessary).

At task exit, despite doing a few rounds of sending of pending data, if netlink were still reporting errors
then it would be a sign of unsustainable rate and the pending queue could be dropped and a message
like you suggest could be sent.

Thoughts ?


--Shailabh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/