Re: Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
From: Horst Birthelmer
Date: Wed Mar 18 2026 - 10:26:44 EST
On Mon, Mar 16, 2026 at 03:06:02PM -0700, Joanne Koong wrote:
> On Mon, Mar 16, 2026 at 1:02 PM Horst Birthelmer <horst@xxxxxxxxxxxxx> wrote:
> >
> > >
> > > Hi Horst,
> > >
> > > I think these are two different entities. cs->pg is the page that
> > > corresponds to the userspace buffer / pipe while the (large) folio
> > > corresponds to the pages in the page cache. flush_dcache_folio(folio)
> > > and flush_dcache_page(cs->pg) are not interchangeable (I don't think
> > > it's likely either that the pages backing the userspace buffer/pipe
> > > are large folios).
> > >
> > > Thanks,
> > > Joanne
> >
> > Hi Joanne,
> >
> > I feel a bit embarassed ... but you are completely right.
> > I was interested in solving this case:
> >
> > fuse_uring_args_to_ring() or fuse_uring_args_to_ring_pages()
> > fuse_copy_init(&cs, true, &iter) ← cs->write = TRUE
> > fuse_copy_args(&cs, num_args, args->in_pages, ...)
> > if (args->in_pages)
> > fuse_copy_folios(cs, arg->size, 0)
> > fuse_copy_folio(cs, &ap->folios[i], ...)
> >
> > when we have large folios
>
> No worries, the naming doesn't make the distinction obvious at all.
> For copying out large folios right now, the copy is still page by page
> due to extracting 1 userspace buffer page at a time (eg the
> iov_iter_get_pages2(... PAGE_SIZE, 1, ...) call in fuse_copy_fill()).
> If we pass in a pages array, iov_iter_getpages2 is able to extract
> multiple pages at a time and save extra overhead with the GUP setup /
> irq save+restore / pagetable walk and the extra req->waitq
> locking/unlocking calls, but when I benchmarked it last year I didn't
> see any noticeable performance improvements from doing this. The extra
> complexity didn't seem worth it. For optimized copying, I think in the
> future high-performance servers will mostly just use fuse-over-iouring
> zero-copy.
>
> Thanks,
> Joanne
>
Hi Joanne,
I wonder, would something like this help for large folios?
@@ -856,8 +856,11 @@ void fuse_copy_finish(struct fuse_copy_state *cs)
cs->currbuf = NULL;
} else if (cs->pg) {
if (cs->write) {
+ struct folio *folio = page_folio(cs->pg);
+
flush_dcache_page(cs->pg);
- set_page_dirty_lock(cs->pg);
+ if (!folio_test_dirty(folio))
+ set_page_dirty_lock(cs->pg);
}
put_page(cs->pg);
}
Do you have seen any problems with spin locks being way too costly while
doing writes?
That was actually why I started looking into this.
Thanks,
Horst