Re: [PATCH v5 0/3] cachestat: a new syscall for page cache state of files

From: Nhat Pham
Date: Wed Jan 04 2023 - 18:22:39 EST


On Wed, Jan 4, 2023 at 3:11 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> Changelog:
> v5:
> * Separate first patch into its own series.
> (suggested by Andrew Morton)
> * Expose filemap_cachestat() to non-syscall usage
> (patch 2) (suggested by Brian Foster).
> * Fix some build errors from last version.
> (patch 2)
> * Explain eviction and recent eviction in the draft man page and
> documentation (suggested by Andrew Morton).
> (patch 2)
> v4:
> * Refactor cachestat and move it to mm/filemap.c (patch 3)
> (suggested by Brian Foster)
> * Remove redundant checks (!folio, access_ok)
> (patch 3) (suggested by Matthew Wilcox and Al Viro)
> * Fix a bug in handling multipages folio.
> (patch 3) (suggested by Matthew Wilcox)
> * Add a selftest for shmem files, which can be used to test huge
> pages (patch 4) (suggested by Johannes Weiner)
> v3:
> * Fix some minor formatting issues and build errors.
> * Add the new syscall entry to missing architecture syscall tables.
> (patch 3).
> * Add flags argument for the syscall. (patch 3).
> * Clean up the recency refactoring (patch 2) (suggested by Yu Zhao)
> * Add the new Kconfig (CONFIG_CACHESTAT) to disable the syscall.
> (patch 3) (suggested by Josh Triplett)
> v2:
> * len == 0 means query to EOF. len < 0 is invalid.
> (patch 3) (suggested by Brian Foster)
> * Make cachestat extensible by adding the `cstat_size` argument in the
> syscall (patch 3)
>
> There is currently no good way to query the page cache state of large
> file sets and directory trees. There is mincore(), but it scales poorly:
> the kernel writes out a lot of bitmap data that userspace has to
> aggregate, when the user really doesn not care about per-page information
> in that case. The user also needs to mmap and unmap each file as it goes
> along, which can be quite slow as well.
>
> This series of patches introduces a new system call, cachestat, that
> summarizes the page cache statistics (number of cached pages, dirty
> pages, pages marked for writeback, evicted pages etc.) of a file, in a
> specified range of bytes. It also include a selftest suite that tests some
> typical usage
>
> This interface is inspired by past discussion and concerns with fincore,
> which has a similar design (and as a result, issues) as mincore.
> Relevant links:
>
> https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04207.html
> https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04209.html
>
> For comparison with mincore, I ran both syscalls on a 2TB sparse file:
>
> Using mincore:
> real 0m37.510s
> user 0m2.934s
> sys 0m34.558s
>
> Using cachestat:
> real 0m0.009s
> user 0m0.000s
> sys 0m0.009s
>
> This series should be applied on top of:
>
> workingset: fix confusion around eviction vs refault container
> https://lkml.org/lkml/2023/1/4/1066
>
> This series consist of 3 patches:
>
> Nhat Pham (3):
> workingset: refactor LRU refault to expose refault recency check
> cachestat: implement cachestat syscall
> selftests: Add selftests for cachestat
>
> MAINTAINERS | 7 +
> arch/alpha/kernel/syscalls/syscall.tbl | 1 +
> arch/arm/tools/syscall.tbl | 1 +
> arch/ia64/kernel/syscalls/syscall.tbl | 1 +
> arch/m68k/kernel/syscalls/syscall.tbl | 1 +
> arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
> arch/parisc/kernel/syscalls/syscall.tbl | 1 +
> arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
> arch/s390/kernel/syscalls/syscall.tbl | 1 +
> arch/sh/kernel/syscalls/syscall.tbl | 1 +
> arch/sparc/kernel/syscalls/syscall.tbl | 1 +
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
> include/linux/fs.h | 3 +
> include/linux/swap.h | 1 +
> include/linux/syscalls.h | 3 +
> include/uapi/asm-generic/unistd.h | 5 +-
> include/uapi/linux/mman.h | 9 +
> init/Kconfig | 10 +
> kernel/sys_ni.c | 1 +
> mm/filemap.c | 143 ++++++++++
> mm/workingset.c | 129 ++++++---
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/cachestat/.gitignore | 2 +
> tools/testing/selftests/cachestat/Makefile | 8 +
> .../selftests/cachestat/test_cachestat.c | 259 ++++++++++++++++++
> 27 files changed, 555 insertions(+), 39 deletions(-)
> create mode 100644 tools/testing/selftests/cachestat/.gitignore
> create mode 100644 tools/testing/selftests/cachestat/Makefile
> create mode 100644 tools/testing/selftests/cachestat/test_cachestat.c
>
>
> base-commit: 1440f576022887004f719883acb094e7e0dd4944
> prerequisite-patch-id: 171a43d333e1b267ce14188a5beaea2f313787fb
> --
> 2.30.2

Oops, I think I accidentally sent out v5 twice :(

Please ignore the first set of emails and review this one only,

LKML link for convenience:
https://lkml.org/lkml/2023/1/4/1095