Re: 2.4.19pre2aa1

From: William Lee Irwin III (wli@holomorphy.com)
Date: Thu Mar 07 2002 - 05:49:42 EST


On Thu, Mar 07, 2002 at 09:21:19AM +0100, Andrea Arcangeli wrote:
> Changed the wait_table stuff, first of all have it per-node (no point
> of per-zone),

Yes there is. It's called locality of reference. Zones exhibit distinct
usage patterns and affinities; hence, they should have distinct tables.
It is unwise to allow pressure on one zone (i.e. many waiters there) to
interfere with (by creating collisions) efficient wakeups for other zones.

On Thu, Mar 07, 2002 at 09:21:19AM +0100, Andrea Arcangeli wrote:
> then let it scale more with the ram in the machine (the
> amount of ram used for the wait table is ridicolously small and it
> mostly depends on the amount of the I/O, not on the amount of ram, so
> set up a low limit instead of an high limit).

The wait_table does not need to scale with the RAM on the machine
        It needs to scale with the number of waiters.

4096 threads blocked on I/O is already approaching or exceeding the
scalability limits of other core kernel subsystems.

On Thu, Mar 07, 2002 at 09:21:19AM +0100, Andrea Arcangeli wrote:
> The hashfn I wasn't
> very confortable with.

I am less than impressed by this unscientific rationale.

On Thu, Mar 07, 2002 at 09:21:19AM +0100, Andrea Arcangeli wrote:
> This one is the same of pagemap.h, and it's
> not that huge on the 64bit archs. If something it had to be a mul.

It was a mul. It was explicitly commented that the mul was expanded to a
shift/add implementation to accommodate 64-bit architectures, and the
motivation for the particular constant to multiply by was documented in
several posts of mine. It didn't "have to be", it was, and was commented.

This is not mere idle speculation and unfounded micro-optimization.
This is a direct consequence of reports of poor performance due to the
cost of multiplication on several 64-bit architectures, which was
resolved by the expansion of the multiplication to shifts and adds
(the multiplier was designed from inception to support expansion to shifts
and adds; it was discovered the compiler could not do this automatically).

And for Pete's sake that pagemap hash function is ridiculously dirty code;
please, regardless of whether you insist on being inefficient for the
sake of some sort of vendetta, or deny the facts upon which the prior
implementation was based, or just refuse to cooperate until the bitter
end, please, please, clean that hash up instead of copying and pasting it.

On Thu, Mar 07, 2002 at 09:21:19AM +0100, Andrea Arcangeli wrote:
> This hashfn ensures to be contigous on the cacheline for physically
> consecutive pages, and once the pages are randomized they just provide
> enough randomization, we just need to protect against the bootup when
> it's more likely the pages are physically consecutive.

Is this some kind of joke? You honestly believe some kind of homebrew
strategy to create collisions is remotely worthwhile? Do you realize
even for one instant that the cost of collisions here is *not* just a
list traversal, but also the spurious wakeup of blocked threads? You've
gone off the deep end in an attempt to be contrary; this is even more
than demonstrably inefficient, it's so blatant inspection reveals it.

Designing your hash function to create collisions is voodoo, pure and
simple. "Just enough randomization" is also voodoo. The pagemap.h hash
was one of the specific hash functions in the kernel producing
measurably horrific hash table distributions on large memory machines,
and my efforts to find a better hash function for it contributed
directly to the design of the hash functions presented in the hashed
waitqueue code. The (rather long) lengths I've gone to to prevent
collisions are justified by the low impact on the implementation (the
choice of multiplier) and the extremely high cost of collisions in this
scheme, which comes from the wake-all semantics of the hash table buckets.

This reversion to a demonstrably poor hashing scheme in the face of
very high collision penalties is laughable.

Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 07 2002 - 21:01:01 EST