Re: [PATCH v11 0/3] cachestat: a new syscall for page cache state of files

From: Andrew Morton
Date: Tue Mar 14 2023 - 19:00:52 EST


On Tue, 7 Mar 2023 19:27:45 -0800 Nhat Pham <nphamcs@xxxxxxxxx> wrote:

> There is currently no good way to query the page cache state of large
> file sets and directory trees. There is mincore(), but it scales poorly:
> the kernel writes out a lot of bitmap data that userspace has to
> aggregate, when the user really doesn not care about per-page information
> in that case. The user also needs to mmap and unmap each file as it goes
> along, which can be quite slow as well.

A while ago I asked about the security implications - could cachestat()
be used to figure out what parts of a file another user is reading.
This also applies to mincore(), but cachestat() newly permits user A to
work out which parts of a file user B has *written* to.

I don't recall seeing a response to this, and there is no discussion in
the changelogs.


Secondly, I'm not seeing description of any use cases. OK, it's faster
and better than mincore(), but who cares? In other words, what
end-user value compels us to add this feature to Linux?


> struct cachestat {
> __u64 nr_cache;
> __u64 nr_dirty;
> __u64 nr_writeback;
> __u64 nr_evicted;
> __u64 nr_recently_evicted;
> };

And these fields are really getting into the weedy details of internal
kernel implementation. Bear in mind that we must support this API for
ever.

Particularly the "evicted" things. The workingset code was implemented
eight years ago, which is actually relatively recent. It could be that
eight years from now it will have been removed and possibly replaced
workingset with something else. Then what do we do?

For these reasons, and because of the lack of enthusiasm I have seen
from others, I don't think a case has yet been made for the addition of
this new syscall.