Re: [PATCH] gfs2: Revert "Fix loop in gfs2_rbm_find"

From: Andreas Gruenbacher
Date: Thu Jan 31 2019 - 14:12:33 EST


On Thu, 31 Jan 2019 at 19:41, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Jan 30, 2019 at 12:30 PM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
> >
> > This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.
> >
> > It turns out that the fix can lead to a ~20 percent performance regression
> > in initial writes to the page cache according to iozone. Let's revert this
> > for now to have more time for a proper fix.
>
> Ugh. I think part of the problem is that the
>
> n += (rbm->bii - initial_bii);
>
> doesn't make any sense when we just set rbm->bii to 0 a couple of
> lines earlier. So it's basically a really odd way to write
>
> n -= initial_bii;
>
> which in turn really doesn't make any sense _either_.

Right, that code clearly doesn't make sense. The only positive about
it is that it hasn't caused any obvious issues so far.

> So I'l all for reverting the commit because it caused a performance
> regression, but the end result really is all kinds of odd.
>
> Is the bug as simple as "we incremented the iterator counter twice"?
> Because in the -E2BIG case, we first increment it by the difference in
> bii, but then we *also* increment it in res_covered_end_of_rgrp()
> (which we do *not* do for the "ret > 0" case, which goes to
> next_iter).

In the "ret > 0" case, rbm->bii should have already been incremented
in gfs2_reservation_check_and_update.

> So if somebody can run the performance test again, how does it look if
> *instead* of the revert, we do what looks at least _slightly_ more
> sensible, and move the "increment iteration count" up to the
> next-bitmap case instead?
>
> At that point, it will actually match the bii updates (because
> next_bitmap also updates rbm->bii). Hmm?
>
> Something like the whitespace-damaged thinig below. Completely
> untested. But *if* this works, it would make more sense than the
> revert..
>
> Hmm?

My idea was to fix that whole loop properly as in this patch instead:

https://www.redhat.com/archives/cluster-devel/2019-January/msg00111.html

That patch just hasn't seen enough testing to make me comfortable
sending it yet. We're testing it right now, and my plan was to get it
into the next merge window. I don't think an intermediate workaround
would make much sense. Does that sound acceptable? Would you rather
have that fix sent to you sometime next week instead?

Thanks a lot,
Andreas