Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap

From: Mina Almasry
Date: Tue Nov 23 2021 - 16:47:51 EST


On Tue, Nov 23, 2021 at 1:30 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Tue, Nov 23, 2021 at 01:10:37PM -0800, Mina Almasry wrote:
> > On Tue, Nov 23, 2021 at 12:51 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> > > > Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> > > > address is currently mapped by a transparent huge page or not. Example
> > > > use case is a process requesting THPs from the kernel (via a huge tmpfs
> > > > mount for example), for a performance critical region of memory. The
> > > > userspace may want to query whether the kernel is actually backing this
> > > > memory by hugepages or not.
> > >
> > > So you want this bit to be clear if the memory is backed by a hugetlb
> > > page?
> > >
> >
> > Yes I believe so. I do not see value in telling the userspace that the
> > virt address is backed by a hugetlb page, since if the memory is
> > mapped by MAP_HUGETLB or is backed by a hugetlb file then the memory
> > is backed by hugetlb pages and there is no vagueness from the kernel
> > here.
> >
> > Additionally hugetlb interfaces are more size based rather than PMD or
> > not. arm64 for example supports 64K, 2MB, 32MB and 1G 'huge' pages and
> > it's an implementation detail that those sizes are mapped CONTIG PTE,
> > PMD, CONITG PMD, and PUD respectively, and the specific mapping
> > mechanism is typically not exposed to the userspace and might not be
> > stable. Assuming pagemap_hugetlb_range() == PMD_MAPPED would not
> > technically be correct.
>
> What I've been trying to communicate over the N reviews of this
> patch series is that *the same thing is about to happen to THPs*.
> Only more so. THPs are going to be of arbitrary power-of-two size, not
> necessarily sizes supported by the hardware. That means that we need to
> be extremely precise about what we mean by "is this a THP?" Do we just
> mean "This is a compound page?" Do we mean "this is mapped by a PMD?"
> Or do we mean something else? And I feel like I haven't been able to
> get that information out of you.
>

Yes, I'm very sorry for the trouble, but I'm also confused what the
disconnect is. To allocate hugepages I can do like so:

mount -t tmpfs -o huge=always tmpfs /mnt/mytmpfs

or

madvise(..., MADV_HUGEPAGE)

Note I don't ask the kernel for a specific size, or a specific mapping
mechanism (PMD/contig PTE/contig PMD/PUD), I just ask the kernel for
'huge' pages. I would like to know whether the kernel was successful
in allocating a hugepage or not. Today a THP hugepage AFAICT is PMD
mapped + is_transparent_hugepage(), which is the check I have here. In
the future, THP may become an arbitrary power of two size, and I think
I'll need to update this querying interface once/if that gets merged
to the kernel. I.e, if in the future I allocate pages by using:

mount -t tmpfs -o huge=2MB tmpfs /mnt/mytmpfs

I need the kernel to tell me whether the mapping is 2MB size or not.

If I allocate pages by using:

mount -t tmpfs -o huge=pmd tmpfs /mnt/mytmps,

Then I need the kernel to tell me whether the pages are PMD mapped or
not, as I'm doing here.

The current implementation is based on what the current THP
implementation is in the kernel, and depending on future changes to
THP I may need to update it in the future. Does that make sense?