Re: "statsfs" API design

From: Paolo Bonzini
Date: Sun Nov 10 2019 - 08:05:15 EST


On 09/11/19 16:49, Greg Kroah-Hartman wrote:
> On Wed, Nov 06, 2019 at 04:56:25PM +0100, Paolo Bonzini wrote:
>> Hi all,
>>
>> statsfs is a proposal for a new Linux kernel synthetic filesystem, to be
>> mounted in /sys/kernel/stats, which exposes subsystem-level statistics
>> in sysfs. Reading need not be particularly lightweight, but writing
>> must be fast. Therefore, statistics are gathered at a fine-grain level
>> in order to avoid locking or atomic operations, and then aggregated by
>> statsfs until the desired granularity.
>
> Wait, reading a statistic from userspace can be slow, but writing to it
> from userspace has to be fast? Or do you mean the speed is all for
> reading/writing the value within the kernel?

Reading/writing from the kernel. Reads from a userspace are a superset
of reading from the kernel, writes from userspace are irrelevant.

[...]

>> As you can see, values are basically integers stored somewhere in a
>> struct. The statsfs_value struct also includes information on which
>> operations (for example sum, min, max, average, count nonzero) it makes
>> sense to expose when the values are aggregated.
>
> What can userspace do with that info?

The basic usage is logging. A turbostat-like tool that is able to use
both debugfs and tracepoints is already in tools/kvm/kvm_stat.

> I have some old notes somewhere about what people really want when it
> comes to a good "statistics" datatype, that I was thinking of building
> off of, but that seems independant of what you are doing here, right?
> This is just exporting existing values to userspace in a semi-sane way?

For KVM yes. But while I'm at it, I'd like the subsystem to be useful
for others so if you can dig out those notes I can integrate that.

> Anyway, I like the idea, but what about how this is exposed to
> userspace? The criticism of sysfs for statistics is that it is too slow
> to open/read/close lots of files and tough to get "at this moment in
> time these are all the different values" snapshots easily. How will
> this be addressed here?

Individual files in sysfs *should* be the first format to export
statsfs, since quick scripts are an important usecase. However, another
advantage of having a higher-level API is that other ways to access the
stats can be added transparently.

The main requirement for that is self-descriptiveness, blindly passing
structs to userspace is certainly the worst format of all. But I don't
like the idea of JSON or anything ASCII; that adds overhead to both
production and parsing, for no particular reason. Tracepoints already
do something like that to export arguments, so perhaps there is room to
reuse code or at least some ideas. It could be binary sysfs files
(again like tracing) or netlink, I haven't thought about it at all.

Paolo