Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

From: Jason Gunthorpe
Date: Mon Sep 14 2020 - 19:29:05 EST


On Mon, Sep 14, 2020 at 03:59:31PM -0700, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 3:55 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >
> > Just as an aside, the RDMA stuff is also supposed to set MADV_DONTFORK
> > on these regions, so I'm a bit puzzled what is happening here.
>
> Did the fork perhaps happen _before_ , so the pages are shared when
> you do the pin?

Looking at the progam, it seems there are a number of forks for exec
before and after pin_user_pages_fast(), but the parent process always
does waitpid() after the fork.

> MADV_DONTFORK doesn't mean COW doesn't happen. It just means that the
> next fork() won't be copying that memory area.

Yes, this stuff does pin_user_pages_fast() and MADV_DONTFORK
together. It sets FOLL_FORCE and FOLL_WRITE to get an exclusive copy
of the page and MADV_DONTFORK was needed to ensure that a future fork
doesn't establish a COW that would break the DMA by moving the
physical page over to the fork. DMA should stay with the process that
called pin_user_pages_fast() (Is MADV_DONTFORK still needed with
recent years work to GUP/etc? It is a pretty terrible ancient thing)

> That said, it's possible that the test cases do something invalid - or
> maybe we've broken MADV_DONTFORK - and it all just happened to work
> before.

Hmm. If symptoms stop with this patch should we investigate
MADV_DONTFORK?

Thanks,
Jason