Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function

From: Linus Torvalds
Date: Wed Jun 01 2016 - 21:19:01 EST

On Mon, May 30, 2016 at 11:10 AM, George Spelvin
<linux@xxxxxxxxxxxxxxxxxxx> wrote:
> I understand, but 64x64-bit multiply on 32-bit is pretty annoyingly
> expensive. In time, code size, and register pressure which bloats
> surrounding code.

Side note, the code seems to work fairly well, but I do worry a bit
about the three large multiplies in link_path_walk().

There's two in fold_hash(), and one comes from "find_zero()".

It turns out to work fairly well on at least modern big-core x86
CPU's, because the multiplier is fairly beefy: low latency (3-4 cycles
in the current ctop) and fully pipelined.

Even atom should be 5 cycles and a multiplication result every two
cycles for 64-bit results.

Maybe we don't care, because looking around the modern ARM and POWER
cores do similarly, but I just wanted to point out that that code does
seem to fairly heavily rely on "everybody has bug and pipelined hw
multipliers" for performance.

.. and it's probably true that transistors are cheap, and crypto and
other uses have made CPU designers spend the effort on good
multipliers. I just remember a time when you definitely couldn't rely
on fast multiplies.