Re: [PATCH] fs/proc: introduce /proc/stat2 file

From: Andrew Morton
Date: Tue Nov 06 2018 - 18:48:45 EST


On Mon, 29 Oct 2018 23:04:45 +0000 Daniel Colascione <dancol@xxxxxxxxxx> wrote:

> On Mon, Oct 29, 2018 at 7:25 PM, Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:
> > This patch introduces a new /proc/stat2 file that is identical to the
> > regular 'stat' except that it zeroes all hard irq statistics. The new
> > file is a drop in replacement to stat for users that need performance.
>
> For a while now, I've been thinking over ways to improve the
> performance of collecting various bits of kernel information. I don't
> think that a proliferation of special-purpose named bag-of-fields file
> variants is the right answer, because even if you add a few info-file
> variants, you're still left with a situation where a given file
> provides a particular caller with too little or too much information.
> I'd much rather move to a model in which userspace *explicitly* tells
> the kernel which fields it wants, with the kernel replying with just
> those particular fields, maybe in their raw binary representations.
> The ASCII-text bag-of-everything files would remain available for
> ad-hoc and non-performance critical use, but programs that cared about
> performance would have an efficient bypass. One concrete approach is
> to let users open up today's proc files and, instead of read(2)ing a
> text blob, use an ioctl to retrieve specified and targeted information
> of the sort that would normally be encoded in the text blob. Because
> callers would open the same file when using either the text or binary
> interfaces, little would have to change, and it'd be easy to implement
> fallbacks when a particular system doesn't support a particular
> fast-path ioctl.

Yup. There are better ways of getting information out of the kernel,
to say the least.

It would be interesting to know precisely which stat fields the
database-which-shall-not-be-named is looking for. Then we could cook
up a very whizzy way of getting at the info.

A downside of the stat2 approach is that applications will need to be
rebuilt. And hopefully when people do this, they'll open
"/etc/my-app-name/symlink-to-proc-stat" (or use per-application config)
so they won't need a rebuild when we add /proc/stat3!

A /proc/change-how-stat-works tunable would avoid the need to rebuild
applications. But if a system still has some applications which want
the irq info then that doesn't work.

It's all very sad, really.

btw,

> +The stat2 file acts as a performance alternative to /proc/stat for workloads
> +and systems that care and are under heavy irq load. In order to to be completely
> +compatible, /proc/stat and /proc/stat2 are identical with the exception that the
> +later will show 0 for any (hard)irq-related fields. This refers particularly

"latter"

> +to the "intr" line and 'irq' column for that aggregate in the cpu line.

btw2, please quantify "poor performance". That helps us determine how
much we care about all of this!