Re: [RFC][PATCH] iov_iter: Add ITER_MAPPING
From: Matthew Wilcox
Date: Fri Jan 24 2020 - 06:21:18 EST
On Thu, Jan 23, 2020 at 11:04:59AM +0000, David Howells wrote:
> Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> > It's perfectly legal to have compound pages in the page cache. Call
> > find_subpage(page, xas.xa_index) unconditionally.
>
> Like this?
>
> #define iterate_mapping(i, n, __v, skip, STEP) { \
> struct page *page; \
> size_t wanted = n, seg, offset; \
> loff_t start = i->mapping_start + skip; \
> pgoff_t index = start >> PAGE_SHIFT; \
> \
> XA_STATE(xas, &i->mapping->i_pages, index); \
> \
> rcu_read_lock(); \
> xas_for_each(&xas, page, ULONG_MAX) { \
I actually quite liked the iterator you had before; I was thinking of
wrapping it up as xas_for_each_contig().
> if (xas_retry(&xas, page) || xa_is_value(page)) { \
> WARN_ON(1); \
> break; \
> } \
Actually, xas_retry() can happen, even with the page itself pinned. It
indicates the xarray data structure changed under you while walking it
and you need to restart the walk from the top (arguably this shouldn't
be exposed to callers at all, and in the future it may not be ... it's
something inherited from the radix tree interface).
So this should be:
if (xas_retry(&xas, page))
continue;
if (WARN_ON(xa_is_value(page)))
break;
> __v.bv_page = find_subpage(page, xas.xa_index); \
Yes.
> offset = (i->mapping_start + skip) & ~PAGE_MASK; \
> seg = PAGE_SIZE - offset; \
> __v.bv_offset = offset; \
> __v.bv_len = min(n, seg); \
> (void)(STEP); \
> n -= __v.bv_len; \
> skip += __v.bv_len; \
> if (n == 0) \
> break; \
> } \
> rcu_read_unlock(); \
> n = wanted - n; \
> }
>
> Note that the walk is not restartable - and the array is supposed to have been
> fully populated by the caller for the range specified - so I've made it print
> a warning and end the loop if xas_retry() or xa_is_value() return true (which
> takes care of the !page case too). Possibly I could just leave it to fault in
> this case and not check.
>
> If PageHuge(page) is true, I presume I need to support that too. How do I
> find out how big the page is?
PageHuge() is only going to be true for hugetlbfs mappings. I'm OK
with not supporting those for now ... eventually I want to get rid of
the special cases in the page cache for hugetlbfs, but there's about
six other projects standing between me and that.