RE: [RFC PATCH] block, fs: use FOLL_LONGTERM as gup_flags for direct IO
From: Sooyong Suk
Date: Fri Mar 07 2025 - 01:38:55 EST
> On Thu, Mar 6, 2025 at 6:07 PM Sooyong Suk <s.suk@xxxxxxxxxxx> wrote:
> >
> > > On Fri, Mar 7, 2025 at 12:26 AM Christoph Hellwig
> > > <hch@xxxxxxxxxxxxx>
> > > wrote:
> > > >
> > > > On Thu, Mar 06, 2025 at 04:40:56PM +0900, Sooyong Suk wrote:
> > > > > There are GUP references to pages that are serving as direct IO
> > > buffers.
> > > > > Those pages can be allocated from CMA pageblocks despite they
> > > > > can be pinned until the DIO is completed.
> > > >
> > > > direct I/O is eactly the case that is not FOLL_LONGTERM and one of
> > > > the reasons to even have the flag. So big fat no to this.
> > > >
> > >
> >
> > Understood.
> >
> > > Hello, thank you for your comment.
> > > We, Sooyong and I, wanted to get some opinions about this
> > > FOLL_LONGTERM for direct I/O as CMA memory got pinned pages which
> > > had been pinned from direct io.
> > >
> > > > You also completely failed to address the relevant mailinglist and
> > > > maintainers.
> > >
> > > I added block maintainer Jens Axboe and the block layer maillinst
> > > here, and added Suren and Sandeep, too.
>
> I'm very far from being a block layer expert :)
>
> >
> > Then, what do you think of using PF_MEMALLOC_PIN for this context as
> below?
> > This will only remove __GFP_MOVABLE from its allocation flag.
> > Since __bio_iov_iter_get_pages() indicates that it will pin user or
> > kernel pages, there seems to be no reason not to use this process flag.
>
> I think this will help you only when the pages are faulted in but if
> __get_user_pages() finds an already mapped page which happens to be
> allocated from CMA, it will not migrate it. So, you might still end up
> with unmovable pages inside CMA.
>
Yes, you're right.
However, we can at least prevent issues from fault-in cases and mitigate
the overall probability of CMA allocation failure. And the pinned pages that
we observed from snapuserd was also allocated by fault-in.
> >
> > block/bio.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/block/bio.c b/block/bio.c index 65c796ecb..671e28966
> > 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -1248,6 +1248,7 @@ static int __bio_iov_iter_get_pages(struct bio
> *bio, struct iov_iter *iter)
> > unsigned len, i = 0;
> > size_t offset;
> > int ret = 0;
> > + unsigned int flags;
> >
> > /*
> > * Move page array up in the allocated memory for the bio vecs
> > as far as @@ -1267,9 +1268,11 @@ static int
> __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > * result to ensure the bio's total size is correct. The remainder
> of
> > * the iov data will be picked up in the next bio iteration.
> > */
> > + flags = memalloc_pin_save();
> > size = iov_iter_extract_pages(iter, &pages,
> > UINT_MAX - bio->bi_iter.bi_size,
> > nr_pages, extraction_flags,
> > &offset);
> > + memalloc_pin_restore(flags);
> > if (unlikely(size <= 0))
> > return size ? size : -EFAULT;
> >
> >