Re: [PATCH v4 00/16] Overhaul multi-page lookups for THP

From: Matthew Wilcox
Date: Tue Nov 17 2020 - 14:16:03 EST


On Tue, Nov 17, 2020 at 08:26:03AM -0800, Hugh Dickins wrote:
> On Tue, 17 Nov 2020, Matthew Wilcox wrote:
> > On Mon, Nov 16, 2020 at 02:34:34AM -0800, Hugh Dickins wrote:
> > > Fix to [PATCH v4 15/16] mm/truncate,shmem: Handle truncates that split THPs.
> > > One machine ran fine, swapping and building in ext4 on loop0 on huge tmpfs;
> > > one machine got occasional pages of zeros in its .os; one machine couldn't
> > > get started because of ext4_find_dest_de errors on the newly mkfs'ed fs.
> > > The partial_end case was decided by PAGE_SIZE, when there might be a THP
> > > there. The below patch has run well (for not very long), but I could
> > > easily have got it slightly wrong, off-by-one or whatever; and I have
> > > not looked into the similar code in mm/truncate.c, maybe that will need
> > > a similar fix or maybe not.
> >
> > Thank you for the explanation in your later email! There is indeed an
> > off-by-one, although in the safe direction.
> >
> > > --- 5103w/mm/shmem.c 2020-11-12 15:46:21.075254036 -0800
> > > +++ 5103wh/mm/shmem.c 2020-11-16 01:09:35.431677308 -0800
> > > @@ -874,7 +874,7 @@ static void shmem_undo_range(struct inod
> > > long nr_swaps_freed = 0;
> > > pgoff_t index;
> > > int i;
> > > - bool partial_end;
> > > + bool same_page;
> > >
> > > if (lend == -1)
> > > end = -1; /* unsigned, so actually very big */
> > > @@ -907,16 +907,12 @@ static void shmem_undo_range(struct inod
> > > index++;
> > > }
> > >
> > > - partial_end = ((lend + 1) % PAGE_SIZE) > 0;
> > > + same_page = (lstart >> PAGE_SHIFT) == end;
> >
> > 'end' is exclusive, so this is always false. Maybe something "obvious":
> >
> > same_page = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
> >
> > (lend is inclusive, so lend in 0-4095 are all on the same page)
>
> My brain is not yet in gear this morning, so I haven't given this the
> necessary thought: but I do have to question what you say there, and
> throw it back to you for the further thought -
>
> the first shmem_getpage(inode, lstart >> PAGE_SHIFT, &page, SGP_READ);
> the second shmem_getpage(inode, end, &page, SGP_READ).
> So same_page = (lstart >> PAGE_SHIFT) == end
> had seemed right to me.

I find both of these functions exceptionally confusing. Does this
make it easier to understand?

@@ -859,22 +859,47 @@ static void shmem_undo_range(struct inode *inode, loff_t l
start, loff_t lend,
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
- pgoff_t start = (lstart + PAGE_SIZE - 1) >> PAGE_SHIFT;
- pgoff_t end = (lend + 1) >> PAGE_SHIFT;
+ pgoff_t start = lstart >> PAGE_SHIFT;
+ pgoff_t end = lend >> PAGE_SHIFT;
struct pagevec pvec;
pgoff_t indices[PAGEVEC_SIZE];
struct page *page;
long nr_swaps_freed = 0;
pgoff_t index;
int i;
- bool same_page;
+ bool same_page = (start == end);

- if (lend == -1)
- end = -1; /* unsigned, so actually very big */
+ page = NULL;
+ shmem_getpage(inode, start, &page, SGP_READ);
+ if (page) {
+ page = thp_head(page);
+ same_page = lend < page_offset(page) + thp_size(page);
+ set_page_dirty(page);
+ if (truncate_inode_partial_page(page, lstart, lend))
+ start++;
+ else
+ start = page->index + thp_nr_pages(page);
+ unlock_page(page);
+ put_page(page);
+ page = NULL;
+ }
+
+ if (!same_page)
+ shmem_getpage(inode, end, &page, SGP_READ);
+ if (page) {
+ page = thp_head(page);
+ set_page_dirty(page);
+ if (truncate_inode_partial_page(page, lstart, lend))
+ end--;
+ else
+ end = page->index - 1;
+ unlock_page(page);
+ put_page(page);
+ }

pagevec_init(&pvec);
index = start;
- while (index < end && find_lock_entries(mapping, index, end - 1,
+ while (index <= end && find_lock_entries(mapping, index, end,
&pvec, indices)) {
for (i = 0; i < pagevec_count(&pvec); i++) {
page = pvec.pages[i];
@@ -900,40 +925,11 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
index++;
}

- same_page = (lend >> PAGE_SHIFT) == (lstart >> PAGE_SHIFT);
- page = NULL;
- shmem_getpage(inode, lstart >> PAGE_SHIFT, &page, SGP_READ);
- if (page) {
- page = thp_head(page);
- same_page = lend < page_offset(page) + thp_size(page);
- set_page_dirty(page);
- if (!truncate_inode_partial_page(page, lstart, lend)) {
- start = page->index + thp_nr_pages(page);
- if (same_page)
- end = page->index;
- }
- unlock_page(page);
- put_page(page);
- page = NULL;
- }
-
- if (!same_page)
- shmem_getpage(inode, end, &page, SGP_READ);
- if (page) {
- page = thp_head(page);
- set_page_dirty(page);
- if (!truncate_inode_partial_page(page, lstart, lend))
- end = page->index;
- unlock_page(page);
- put_page(page);
- }
-
index = start;
- while (index < end) {
+ while (index <= end) {
cond_resched();

- if (!find_get_entries(mapping, index, end - 1, &pvec,
- indices)) {
+ if (!find_get_entries(mapping, index, end, &pvec, indices)) {
/* If all gone or hole-punch or unfalloc, we're done */
if (index == start || end != -1)
break;

That is, we change the definitions of start and end to be the more natural
"index of page which contains the first/last byte". Then we deal with
the start and end of the range, and adjust the start & end appropriately.

I almost managed to get rid of 'same_page' until I thought about the case
where start was a compound page, and split succeeded. In this case, we
already dealt with the tail and don't want to deal with it again.