Re: Extending page pinning into fs/direct-io.c

From: David Howells
Date: Wed May 24 2023 - 04:48:13 EST


Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

> > What I'd like to do is to make the GUP code not take a ref on the zero_page
> > if, say, FOLL_DONT_PIN_ZEROPAGE is passed in, and then make the bio cleanup
> > code always ignore the zero_page.
>
> I don't think that'll work, as we can't mix different pin vs get types
> in a bio. And that's really a good thing.

True - but I was thinking of just treating the zero_page specially and never
hold a pin or a ref on it. It can be checked by address, e.g.:

static inline void bio_release_page(struct bio *bio, struct page *page)
{
if (page == ZERO_PAGE(0))
return;
if (bio_flagged(bio, BIO_PAGE_PINNED))
unpin_user_page(page);
else if (bio_flagged(bio, BIO_PAGE_REFFED))
put_page(page);
}

I'm slightly concerned about the possibility of overflowing the refcount. The
problem is that it only takes about 2 million pins to do that (because the
zero_page isn't a large folio) - which is within reach of userspace. Create
an 8GiB anon mmap and do a bunch of async DIO writes from it. You won't hit
ENOMEM because it will stick ~2 million pointers to zero_page into the page
tables.

> > Something that I noticed is that the dio code seems to wangle to page bits on
> > the target pages for a DIO-read, which seems odd, but I'm not sure I fully
> > understand the code yet.
>
> I don't understand this sentence.

I was looking at this:

static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
{
...
if (dio->is_async && dio_op == REQ_OP_READ && dio->should_dirty)
bio_set_pages_dirty(bio);
...
}

but looking again, the lock is taken briefly and the dirty bit is set - which
is reasonable. However, should we be doing it before starting the I/O?

David