Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory
From: Andrea Arcangeli
Date: Tue Nov 20 2018 - 12:04:11 EST
On Tue, Nov 20, 2018 at 12:11:22PM +0300, Kirill A. Shutemov wrote:
> On Sat, Nov 10, 2018 at 11:44:12AM -0500, Andrea Arcangeli wrote:
> > I would prefer to add intelligence to detect when COWs after fork
> > should be done at 2m or 4k granularity (in the latter case by
> > splitting the pmd before the actual COW while leaving the transhuge
> > pmd intact in the other mm), because that would save CPU (and it'd
> > automatically optimize redis). The snapshot process especially would
> > run faster as it will read with THP performance.
>
> I would argue we should switch to 4k COW everywhere. But it requires some
We could do that if MADV_HUGEPAGE is not set for example. So there
would still be a way to force the 2M cows if something benefits from
them.
For example with binaries executed in tmpfs one could want 2M cows on
MAP_PRIVATE to keep all the executable in 2MB tlbs despite the memory
loss (but then there are those libs that apparently aren't released to
load the binaries into THP anon too for the same reason and with even
higher memory waste risk as unlike tmpfs nothing can be shared if you
run multiple copies of a go large binary or something).
Certainly it would help whenever fork() is used for snapshotting
purposes, but then fork() used for snapshotting purposes doesn't look
the best mechanism possible for atomic snapshots.
It would be interesting to know which other common workloads will
benefit, for workloads that unlike fork()-for-snapshot, are already as
optimal as it can get.
> work on khugepaged side to be able to recover THP back after multiple 4k
> COW in the range. Currently khugepaged is not able to collapse PTE entires
> backed by compound page back to PMD.
Yes this is also answering Anthony question about what shall happen
after to the 4k cows on the doublemap.
The thing is, by the time khugepaged comes around, the child will
hopefully already have quit, so it would be ideal if it can understand
the anon page isn't even shared anymore, it's fully private to the
process after holding the mmap_sem for writing, so if it's not-shared
anymore and mapcount is 1, khugepaged doesn't need to do the 2M cow of
the doublemap THP at all, it just needs to flush the 4k fragment back
to the THP and drop the doublemap and convert the readonly pte entries
to a writable pmd_trans_huge (if VM_WRITE is still set).
> I have this on my todo list for long time, but...
We're also slowly making progress on the uffd-wp to offer a hopefully
way more efficient way to do the snapshot than using fork(), then the
whole fork thing won't be an issue because there will be no fork.