Re: Extending page pinning into fs/direct-io.c

From: David Hildenbrand
Date: Wed May 24 2023 - 03:08:09 EST

On 23.05.23 22:16, David Howells wrote:
Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

But can you please also take care of the legacy direct I/O code? I'd really
hate to leave yet another unfinished transition around.

I've been poking at it this afternoon, but it doesn't look like it's going to
be straightforward, unfortunately. The mm folks have been withdrawing access
to the pinning API behind the ramparts of the mm/ dir. Further, the dio code
will (I think), under some circumstances, arbitrarily insert the zero_page
into a list of things that are maybe pinned or maybe unpinned, but I can (I
think) also be given a pinned zero_page from the GUP code if the page tables
point to one and a DIO-write is requested - so just doing if page == zero_page
isn't sufficient.

What I'd like to do is to make the GUP code not take a ref on the zero_page
if, say, FOLL_DONT_PIN_ZEROPAGE is passed in, and then make the bio cleanup
code always ignore the zero_page.

We discussed doing that unconditionally in the context of vfio (below), but vfio
decided to add a workaround suitable for stable.

In case of FOLL_PIN it's simple: if we detect the zeropage, don't mess with the
refcount when pinning and don't mess with the refcount when unpinning (esp.
unpin_user_pages). FOLL_GET is a different story but we don't have to mess with

So there shouldn't be need for a FOLL_DONT_PIN_ZEROPAGE, we could just do it

Alternatively, I can drop the pin immediately if I get given one on the
zero_page - it's not going anywhere, after all.

That's what vfio did in

commit 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4
Author: Alex Williamson <alex.williamson@xxxxxxxxxx>
Date: Mon Aug 29 21:05:40 2022 -0600

vfio/type1: Unpin zero pages
There's currently a reference count leak on the zero page. We increment
the reference via pin_user_pages_remote(), but the page is later handled
as an invalid/reserved page, therefore it's not accounted against the
user and not unpinned by our put_pfn().
Introducing special zero page handling in put_pfn() would resolve the
leak, but without accounting of the zero page, a single user could
still create enough mappings to generate a reference count overflow.
The zero page is always resident, so for our purposes there's no reason
to keep it pinned. Therefore, add a loop to walk pages returned from
pin_user_pages_remote() and unpin any zero pages.

For vfio that handling no longer required, because FOLL_LONGTERM will never pin
the shared zeropage.


David / dhildenb