Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned

From: Linus Torvalds
Date: Mon Sep 28 2020 - 15:30:20 EST


On Mon, Sep 28, 2020 at 11:39 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> I prefer the version where read pin and write pin are symmetric. The
> PTE in the MM should not change once pinned.

The thing is, I don't really see how to do that.

Right now the write pin fastpath part depends on the PTE being
writable. That implies "this VM has access to this page".

For a read pin there simply is no other way to do it.

So we'd basically say "fast read pin only works on writable pages",
and then we'd have to go to the slow path if it isn't dirty and
writable.

And the slow path would then do whatever COW is required, but it
wouldn't mark the result dirty (and in the case of a shared mapping,
couldn't mark it writable).

So a read pin action would basically never work for the fast-path for
a few cases, notably a shared read-only mapping - because we could
never mark it in the page tables as "fast pin accessible"

See the problem? A read-only pin is fundamentally different from a
write one, because a write one has that fundamental mark of "I have
private access to this page" in ways a read one simply does not.

So we could make the requirement be that a pinned page is either

(a) from a shared mapping (so the pinning depends on the page cache
association). But we can't test this in the fast path.

or

(b) for a private mapping we require page_mapcount() == 1 and that
it's writable.

but since (a) requires the mapping type, we can't check in the fast
path - we only have the PTE and the page. So the fast-path can only
"emulate" it by that "writable", which is a proper subset of (a) or
(b), but it's not something that is in any way guaranteed.

End result: FOLL_PIN would really only work on private pages, and only
if you don't want to share with the page cache.

And it would basically have no advantages over a writable FOLL_PIN. It
would break the association with any backing store for private pages,
because otherwise it can't follow future writes.

Linus