Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

From: Hugh Dickins
Date: Fri Jun 07 2019 - 07:01:50 EST


On Thu, 6 Jun 2019, Yang Shi wrote:
> On 5/7/19 10:10 AM, Yang Shi wrote:
> > On 5/7/19 3:47 AM, Michal Hocko wrote:
> > > [Hmm, I thought, Hugh was CCed]
> > >
> > > On Mon 06-05-19 16:37:42, Yang Shi wrote:
> > > >
> > > > On 4/28/19 12:13 PM, Yang Shi wrote:
> > > > >
> > > > > On 4/23/19 10:52 AM, Michal Hocko wrote:
> > > > > > On Wed 24-04-19 00:43:01, Yang Shi wrote:
> > > > > > > The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility
> > > > > > > for each
> > > > > > > vma") introduced THPeligible bit for processes' smaps. But, when
> > > > > > > checking
> > > > > > > the eligibility for shmem vma, __transparent_hugepage_enabled()
> > > > > > > is
> > > > > > > called to override the result from shmem_huge_enabled(). It may
> > > > > > > result
> > > > > > > in the anonymous vma's THP flag override shmem's. For example,
> > > > > > > running a
> > > > > > > simple test which create THP for shmem, but with anonymous THP
> > > > > > > disabled,
> > > > > > > when reading the process's smaps, it may show:
> > > > > > >
> > > > > > > 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
> > > > > > > Size:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 4096 kB
> > > > > > > ...
> > > > > > > [snip]
> > > > > > > ...
> > > > > > > ShmemPmdMapped:ÂÂÂÂ 4096 kB
> > > > > > > ...
> > > > > > > [snip]
> > > > > > > ...
> > > > > > > THPeligible:ÂÂÂ 0
> > > > > > >
> > > > > > > And, /proc/meminfo does show THP allocated and PMD mapped too:
> > > > > > >
> > > > > > > ShmemHugePages:ÂÂÂÂ 4096 kB
> > > > > > > ShmemPmdMapped:ÂÂÂÂ 4096 kB
> > > > > > >
> > > > > > > This doesn't make too much sense. The anonymous THP flag should
> > > > > > > not
> > > > > > > intervene shmem THP. Calling shmem_huge_enabled() with checking
> > > > > > > MMF_DISABLE_THP sounds good enough. And, we could skip stack and
> > > > > > > dax vma check since we already checked if the vma is shmem
> > > > > > > already.
> > > > > > Kirill, can we get a confirmation that this is really intended
> > > > > > behavior
> > > > > > rather than an omission please? Is this documented? What is a
> > > > > > global
> > > > > > knob to simply disable THP system wise?
> > > > > Hi Kirill,
> > > > >
> > > > > Ping. Any comment?
> > > > Talked with Kirill at LSFMM, it sounds this is kind of intended
> > > > behavior
> > > > according to him. But, we all agree it looks inconsistent.
> > > >
> > > > So, we may have two options:
> > > > ÂÂÂÂ - Just fix the false negative issue as what the patch does
> > > > ÂÂÂÂ - Change the behavior to make it more consistent
> > > >
> > > > I'm not sure whether anyone relies on the behavior explicitly or
> > > > implicitly
> > > > or not.
> > > Well, I would be certainly more happy with a more consistent behavior.
> > > Talked to Hugh at LSFMM about this and he finds treating shmem objects
> > > separately from the anonymous memory. And that is already the case
> > > partially when each mount point might have its own setup. So the primary
> > > question is whether we need a one global knob to controll all THP
> > > allocations. One argument to have that is that it might be helpful to
> > > for an admin to simply disable source of THP at a single place rather
> > > than crawling over all shmem mount points and remount them. Especially
> > > in environments where shmem points are mounted in a container by a
> > > non-root. Why would somebody wanted something like that? One example
> > > would be to temporarily workaround high order allocations issues which
> > > we have seen non trivial amount of in the past and we are likely not at
> > > the end of the tunel.
> >
> > Shmem has a global control for such use. Setting shmem_enabled to "force"
> > or "deny" would enable or disable THP for shmem globally, including non-fs
> > objects, i.e. memfd, SYS V shmem, etc.
> >
> > >
> > > That being said I would be in favor of treating the global sysfs knob to
> > > be global for all THP allocations. I will not push back on that if there
> > > is a general consensus that shmem and fs in general are a different
> > > class of objects and a single global control is not desirable for
> > > whatever reasons.
> >
> > OK, we need more inputs from Kirill, Hugh and other folks.
>
> [Forgot cc to mailing lists]
>
> Hi guys,
>
> How should we move forward for this one? Make the sysfs knob
> (/sys/kernel/mm/transparent_hugepage/enabled) to be global for both anonymous
> and tmpfs? Or just treat shmem objects separately from anon memory then fix
> the false-negative of THP eligibility by this patch?

Sorry for not getting back to you sooner on this.

I don't like to drive design by smaps. I agree with the word "mess" used
several times of THP tunings in this thread, but it's too easy to make
that mess worse by unnecessary changes, so I'm very cautious here.

The addition of "THPeligible" without an "Anon" in its name was
unfortunate. I suppose we're two releases too late to change that.

Applying process (PR_SET_THP_DISABLE) and mm (MADV_*HUGEPAGE)
limitations to shared filesystem objects doesn't work all that well.

I recommend that you continue to treat shmem objects separately from
anon memory, and just make the smaps "THPeligible" more often accurate.

Is your v2 patch earlier in this thread the best for that?
No answer tonight, I'll re-examine later in the day.

Hugh