Re: [PATCH 0/5] mm: Break COW for pinned pages during fork()
From: Leon Romanovsky
Date: Wed Sep 23 2020 - 06:21:25 EST
On Mon, Sep 21, 2020 at 05:17:39PM -0400, Peter Xu wrote:
> Finally I start to post formal patches because it's growing. And also since
> we've discussed quite some issues already, so I feel like it's clearer on what
> we need to do, and how.
> This series is majorly inspired by the previous discussion on the list ,
> starting from the report from Jason on the rdma test failure. Linus proposed
> the solution, which seems to be a very nice approach to avoid the breakage of
> userspace apps that didn't use MADV_DONTFORK properly before. More information
> can be found in that thread too.
> I believe the initial plan was to consider merging something like this for
> rc7/rc8. However now I'm not sure due to the fact that the code change in
> copy_pte_range() is probably more than expected, so it can be with some risk.
> I'll leave this question to the reviewers...
> I tested it myself with fork() after vfio pinning a bunch of device pages, and
> I verified that the new copy pte logic worked as expected at least in the most
> general path. However I didn't test thp case yet because afaict vfio does not
> support thp backed dma pages. Luckily, the pmd/pud thp patch is much more
> straightforward than the pte one, so hopefully it can be directly verified by
> some code review plus some more heavy-weight rdma tests.
> Patch 1: Introduce mm.has_pinned (as single patch as suggested by Jason)
> Patch 2-3: Some slight rework on copy_page_range() path as preparation
> Patch 4: Early cow solution for pte copy for pinned pages
> Patch 5: Same as above, but for thp (pmd/pud).
> Hugetlbfs fix is still missing, but as planned, that's not urgent so we can
> work upon. Comments greatly welcomed.
I'm ware that this series is under ongoing review and probably not
final, but we tested anyway and it solves our RDMA failures.
> Peter Xu (5):
> mm: Introduce mm_struct.has_pinned
> mm/fork: Pass new vma pointer into copy_page_range()
> mm: Rework return value for copy_one_pte()
> mm: Do early cow for pinned pages during fork() for ptes
> mm/thp: Split huge pmds/puds if they're pinned when fork()
> include/linux/mm.h | 2 +-
> include/linux/mm_types.h | 10 ++
> kernel/fork.c | 3 +-
> mm/gup.c | 6 ++
> mm/huge_memory.c | 26 +++++
> mm/memory.c | 226 +++++++++++++++++++++++++++++++++++----
> 6 files changed, 248 insertions(+), 25 deletions(-)