Re: [PATCH 06/16] huge tmpfs: shmem_is_huge(vma, inode, index)

From: Hugh Dickins
Date: Sun Aug 01 2021 - 01:38:27 EST


On Sat, 31 Jul 2021, Hugh Dickins wrote:
> On Fri, 30 Jul 2021, Yang Shi wrote:
> > On Fri, Jul 30, 2021 at 12:42 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> > >
> > > Extend shmem_huge_enabled(vma) to shmem_is_huge(vma, inode, index), so
> > > that a consistent set of checks can be applied, even when the inode is
> > > accessed through read/write syscalls (with NULL vma) instead of mmaps
> > > (the index argument is seldom of interest, but required by mount option
> > > "huge=within_size"). Clean up and rearrange the checks a little.
> > >
> > > This then replaces the checks which shmem_fault() and shmem_getpage_gfp()
> > > were making, and eliminates the SGP_HUGE and SGP_NOHUGE modes: while it's
> > > still true that khugepaged's collapse_file() at that point wants a small
> > > page, the race that might allocate it a huge page is too unlikely to be
> > > worth optimizing against (we are there *because* there was at least one
> > > small page in the way), and handled by a later PageTransCompound check.
> >
> > Yes, it seems too unlikely. But if it happens the PageTransCompound
> > check may be not good enough since the page allocated by
> > shmem_getpage() may be charged to wrong memcg (root memcg). And it
> > won't be replaced by a newly allocated huge page so the wrong charge
> > can't be undone.
>
> Good point on the memcg charge: I hadn't thought of that. Of course
> it's not specific to SGP_CACHE versus SGP_NOHUGE (this patch), but I
> admit that a huge mischarge is hugely worse than a small mischarge.

Stupid me (and maybe I haven't given this enough consideration yet):
but, much better than SGP_NOHUGE, much better than SGP_CACHE, would be
SGP_READ there, wouldn't it? Needs to beware of the NULL too, of course.

Hugh