Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats

From: Shailabh Nagar
Date: Thu Jun 29 2006 - 15:59:54 EST


Andrew Morton wrote:

On Thu, 29 Jun 2006 15:10:31 -0400
Shailabh Nagar <nagar@xxxxxxxxxxxxxx> wrote:



I agree, and I'm viewing this as blocking the taskstats merge. Because if
this _is_ a problem then it's a big one because fixing it will be
intrusive, and might well involve userspace-visible changes.




First off, just a reminder that this is inherently a netlink flow control issue...which was being exacerbated
earlier by taskstats decision to send per-tgid data (no longer the case).

But I'd like to know whats our target here ? How many messages per second do we want to be able to be sent
and received without risking any loss of data ? Netlink will lose messages at a high enough rate so the design point
will need to be known a bit.

For statistics type usage of the genetlink/netlink, I would have thought that userspace, provided it is reliably informed
about the loss of data through ENOBUFS, could take measures to just account for the missing data and carry on ?



Could be so. But we need to understand how significant the impact of this
will be in practice.

We could find, once this is deployed is real production environments on
large machines that the data loss is sufficiently common and sufficiently
serious that the feature needs a lot of rework.

Now there's always a risk of that sort of thing happening with all
features, but it's usually not this evident so early in the development
process. We need to get a better understanding of the risk before
proceeding too far.


Ok.

I suppose we should first determine what number of tasks can be forked/exited at a sustained rate
on these m/c's and that would be one upper bound.

Paul, Chris, Jay,
What total exit rate would be a good upper bound ? How much memory do these 1024 CPU machines
have (in high end configurations, not just based on 64-bit addressability) and how many tasks can actually be
forked/exited in such a machine ?

And there's always a 100% reliable fix for this: throttling. Make the
sender of the messages block until the consumer can catch up. In some
situations, that is what people will want to be able to do.

Is this really an option for taskstats ? Allowing exits to get throttled ? I suppose its one way
but seems like overkill for something like stats.

I suspect a
good implementation would be to run a collection daemon on each CPU and
make the delivery be cpu-local. That's sounding more like relayfs than
netlink.


Yup...per-cpu, high speed delivery is looking like relayfs alright.

One option that we've not explored in detail is the "dump" functionality of genetlink which allows
kernel space to keep getting called with skb's to fill until its done. How much buffering that affords us
in the face of a slow user is not known. But if we're discussing large exit rates happening in a burst, not
a sustained way, that may be one way out.

Jamal,
any thoughts on the flow control capabilities of netlink that apply here ? Usage of the connection is to
supply statistics data to userspace.

--Shailabh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/