Re: [PATCH v8 00/10] iov_iter: Improve page extraction (pin or just list)
From: David Hildenbrand
Date: Tue Jan 24 2023 - 07:45:16 EST
On 23.01.23 18:29, David Howells wrote:
Hi Al, Christoph,
Here are patches to provide support for extracting pages from an iov_iter
and to use this in the extraction functions in the block layer bio code.
The patches make the following changes:
(1) Add a function, iov_iter_extract_pages() to replace
iov_iter_get_pages*() that gets refs, pins or just lists the pages as
appropriate to the iterator type.
Add a function, iov_iter_extract_mode() that will indicate from the
iterator type how the cleanup is to be performed, returning FOLL_PIN
or 0.
(2) Add a function, folio_put_unpin(), and a wrapper, page_put_unpin(),
that take a page and the return from iov_iter_extract_mode() and do
the right thing to clean up the page.
(3) Make the bio struct carry a pair of flags to indicate the cleanup
mode. BIO_NO_PAGE_REF is replaced with BIO_PAGE_REFFED (equivalent to
FOLL_GET) and BIO_PAGE_PINNED (equivalent to BIO_PAGE_PINNED) is
added.
(4) Add a function, bio_release_page(), to release a page appropriately to
the cleanup mode indicated by the BIO_PAGE_* flags.
(5) Make the iter-to-bio code use iov_iter_extract_pages() to retain the
pages appropriately and clean them up later.
(6) Fix bio_flagged() so that it doesn't prevent a gcc optimisation.
(7) Renumber FOLL_PIN and FOLL_GET down so that they're at bits 0 and 1
and coincident with BIO_PAGE_PINNED and BIO_PAGE_REFFED. The compiler
can then optimise on that. Also, it's probably going to be necessary
to embed these in the page pointer in sk_buff fragments. This patch
can go independently through the mm tree.
^ I feel like some of that information might be stale now that you're
only using FOLL_PIN.
I've pushed the patches here also:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-extract
I gave this a quick test and it indeed fixes the last remaining test
case of my O_DIRECT+fork tests [1] that was still failing on upstream
(test3).
Once landed upstream, if we feel confident enough (I tend to), we could
adjust the open() man page to state that O_DIRECT can now be run
concurrently with fork(). Especially, the following documentation might
be adjusted:
"O_DIRECT I/Os should never be run concurrently with the fork(2)
system call, if the memory buffer is a private mapping (i.e., any
mapping created with the mmap(2) MAP_PRIVATE flag; this includes memory
allocated on the heap and statically allocated buffers). Any such
I/Os, whether submitted via an asynchronous I/O interface or from
another thread in the process, should be completed before fork(2) is
called. Failure to do so can result in data corruption and undefined
behavior in parent and child processes."
This series does not yet fix vmsplice()+hugetlb ... simply because your
series does not mess with the vmsplice() implementation I assume ;) Once
vmsplice() uses FOLL_PIN, all cow tests should be passing as well. Easy
to test:
$ cd tools/testing/selftests/vm/
$ echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ echo 2 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
$ ./cow
...
Bail out! 8 out of 190 tests failed
# Totals: pass:181 fail:8 xfail:0 xpass:0 skip:1 error:0
[1] https://gitlab.com/davidhildenbrand/o_direct_fork_tests
--
Thanks,
David / dhildenb