Re: [PATCH v2] Increase page and bit waitqueue hash size

From: Linus Torvalds
Date: Wed Mar 17 2021 - 15:27:49 EST


On Wed, Mar 17, 2021 at 3:44 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>
> Argh, because I didn't test small. Sorry I had the BASE_SMALL setting in
> another patch and thought it would be a good idea to mash them together.
> In hindsight probably not even if it did build.

I was going to complain about that code in general.

First complaining about the hash being small, and then adding a config
option to make it ridiculously much *smaller* seemed wrong to begin
with, and didn't make any sense.

So no, please don't smash together.

In fact, I'd like to see this split up, and with more numbers:

- separate out the bit_waitqueue thing that is almost certainly not
remotely as critical (and maybe not needed at all)

- show the profile number _after_ the patch(es)

- explain why you picked the random scaling numbers (21 and 22 for
the two different cases)?

- give an estimate of how big the array now ends up being for
different configurations.

I think it ends up using that "scale" factor of 21, and basically
being "memory size >> 21" and then rounding up to a power of two.

And honestly, I'm not sure that makes much sense. So for a 1GB machine
we get the same as we used to for the bit waitqueue (twice as many for
the page waitqueue) , but if you run on some smaller setup, you
apparently can end up with just a couple of buckets.

So I'd feel a lot better about this if I saw the numbers, and got the
feeling that the patch actually tries to take legacy machines into
account.

And even on a big machine, what's the advantage of scaling perfectly
with memory. If you have a terabyte of RAM, why would you need half a
million hash entries (if I did the math right), and use 4GB of memory
on it? The contention doesn't go up by amount of memory, it goes up
roughly by number of threads, and the two are very seldom really all
that linearly connected.

So honestly, I'd like to see more reasonable numbers. I'd like to see
what the impact of just raising the hash bit size from 8 to 16 is on
that big machine. Maybe still using alloc_large_system_hash(), but
using a low-imit of 8 (our traditional very old number that hasn't
been a problem even on small machines), and a high-limit of 16 or
something.

And if you want even more, I really really want that justified by the
performance / profile numbers.

And does does that "bit_waitqueue" really merit updating AT ALL? It's
almost entirely unused these days. I think maybe the page lock code
used to use that, but then realized it had more specialized needs, so
now it's separate.

So can we split that bit-waitqueue thing up from the page waitqueue
changes? They have basically nothing in common except for a history,
and I think they should be treated separately (including the
explanation for what actually hits the bottleneck).

Linus