Re: [PATCH v10 2/3] cachestat: implement cachestat syscall
From: Nhat Pham
Date: Sun Mar 05 2023 - 05:24:52 EST
On Thu, Mar 2, 2023 at 11:03 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 02, 2023 at 10:55:48PM -0800, Nhat Pham wrote:
> > On Sun, Feb 19, 2023 at 4:21 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > > +/**
> > > > + * filemap_cachestat() - compute the page cache statistics of a mapping
> > > > + * @mapping: The mapping to compute the statistics for.
> > > > + * @first_index: The starting page cache index.
> > > > + * @last_index: The final page index (inclusive).
> > > > + * @cs: the cachestat struct to write the result to.
> > > > + *
> > > > + * This will query the page cache statistics of a mapping in the
> > > > + * page range of [first_index, last_index] (inclusive). The statistics
> > > > + * queried include: number of dirty pages, number of pages marked for
> > > > + * writeback, and the number of (recently) evicted pages.
> > > > + */
> > >
> > > Do we care that this isn't going to work for hugetlbfs?
> >
> > I ran a quick test using hugetlbfs. It looks like the current
> > implementation is treating it in accordance to the multi-page
> > folio case we discussed earlier, i.e:
> >
> > - Returned number of "pages" is in terms of the number of
> > base/small pages (i.e 512 dirty pages instead of 1 dirty
> > huge page etc.)
> > - If we touch one byte in the huge page, it would report the
> > entire huge page as dirty, but again in terms of the underlying
> > pages.
> >
> > Is this what you have in mind, or is there another edge
> > case that I'm missing...?
>
> Hugetlbfs indexes its pages by hugepage number rather than by smallpage
> number. Imagine you have a 2MB folio at offset 4MB into the file.
> Filesystems other than hugetlbfs store it at indices 1024-1535.
> hugetlbfs stores it at index 2.
>
> So your report probably seems to work, but if you ask it about a
> range, you might be surprised by how wide that range will cover for
> hugetlbfs.
>
> I know Sidhartha is working on fixing that, but I'm not sure if what he
> has is working yet.
Oh I see! Thanks for letting us know about this, Matthew.
In that case, I think we should just drop support for hugetlbfs for now.
It's also an odd ball - not swappable, no backing files, not fully
under user management, etc., so it's less interesting with respect to
cachestat as well.
In the future, we can always lift this restriction with a follow-up patch
or with Sidhartha's fix.