RE: [PATCH 1/4] exfat: Simplify exfat_utf8_d_hash() for code points above U+FFFF

From: Kohada.Tetsuhiro@xxxxxxxxxxxxxxxxxxxxxxxxxxx
Date: Mon Apr 06 2020 - 05:40:06 EST


> > If you want to get an unbiased hash value by specifying an 8 or 16-bit
> > value,
>
> Hello! In exfat we have sequence of 21-bit values (not 8, not 16).

hash_32() generates a less-biased hash, even for 21-bit characters.

The hash of partial_name_hash() for the filename with the following character is ...
- 21-bit(surrogate pair): the upper 3-bits of hash tend to be 0.
- 16-bit(mostly CJKV): the upper 8-bits of hash tend to be 0.
- 8-bit(mostly latin): the upper 16-bits of hash tend to be 0.

I think the more frequently used latin/CJKV characters are more important
when considering the hash efficiency of surrogate pair characters.

The hash of partial_name_hash() for 8/16-bit characters is also biased.
However, it works well.

Surrogate pair characters are used less frequently, and the hash of
partial_name_hash() has less bias than for 8/16 bit characters.

So I think there is no problem with your patch.


> Did you mean hash_32() function from linux/hash.h?

Oops. I forgot '_'.
hash_32() is correct.


---
Kohada Tetsuhiro <Kohada.Tetsuhiro@xxxxxxxxxxxxxxxxxxxxxxxxxxx>