Re: [RFC 0/2] erofs: introduce bloom filter for xattr
From: Jingbo Xu
Date: Thu Jun 22 2023 - 02:37:11 EST
On 6/21/23 7:50 PM, Alexander Larsson wrote:
> On Wed, Jun 21, 2023 at 10:32 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>>
>> Background
>> ==========
>> Filesystems with ACL enabled generally need to read
>> "system.posix_acl_access"/"system.posix_acl_default" xattr to get the
>> access and default ACL. When filesystem is mounted with ACL enabled
>> while files in the system have not set access/default ACL, the getattr()
>> will run in vain while the round trip can decrease the performance in
>> workload like "ls -lR".
>>
>> For example, there's a 12% performance boost if erofs is mounted with
>> "noacl" when running "ls -lR" workload on dataset [1] (given in [2]).
>>
>> We'd better offer a fastpath to boost the above workload, as well as
>> other negative xattr lookup.
>>
>>
>> Proposal
>> ========
>> Introduce a per-inode bloom filter for xattrs to boost the negative
>> xattr queries.
>>
>> As following shows, a 32-bit bloom filter is introduced for each inode,
>> describing if a xattr with specific name exists on this inode.
>>
>> ```
>> struct erofs_xattr_ibody_header {
>> - __le32 h_reserved;
>> + __le32 h_map; /* bloom filter */
>> ...
>> }
>> ```
>>
>> Following are some implementation details for bloom filter.
>>
>> 1. Reverse bit value
>> --------------------
>> The bloom filter structure describes if a given data is inside the set.
>> It will map the given data into several bits of the bloom filter map.
>> The data must not exist inside the set if any mapped bit is 0, while the
>> data may be not inside the set even if all mapped bits is 1.
>>
>> While in our use case, as erofs_xattr_ibody_header.h_map is previously a
>> (all zero) reserved field, the bit value for the bloom filter has a
>> reverse semantics in consideration for compatibility. That is, for a
>> given data, the mapped bits will be cleared to 0. Thus for a previously
>> built image without support for bloom filter, the bloom filter is all
>> zero and when it's mounted by the new kernel with support for bloom
>> filter, it can not determine if the queried xattr exists on the inode and
>> thus will fallback to the original routine of iterating all on-disk
>> xattrs to determine if the queried xattr exists.
>>
>>
>> 2. The number of hash functions
>> -------------------------------
>> The optimal value for the number of the hash functions (k) is (ln2 *
>> m/n), where m stands the number of bits of the bloom filter map, while n
>> stands the number of all candidates may be inside the set.
>>
>> In our use case, the number of common used xattr (n) is approximately 8,
>> including system.[posix_acl_access|posix_acl_default],
>> security.[capability|selinux] and
>> security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].
>>
>> Given the number of bits of the bloom filter (m) is 32, the optimal value
>> for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).
>
> This is indeed the optimal value in a traditional use of bloom
> filters. However, I think it is based on a much larger set of values.
> For this usecase it may be better to choose a different value.
>
> I did some research a while ago on this, and I thought about the
> counts too. Having more than one hash function is useful because it
> allows you to avoid problems if two values happen to hash to the same
> bucket, but this happens at the cost of there being less "unique
> buckets". I spent some time looking for common xattr values
> (including some from userspace) and ended up with a list of about 30.
Yeah, if the number of common used xattr (n) is 30, then the optimal
value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
The optimal value in theory also matches our intuition.
> If we can choose a single hash function that maps all (or most) of
> these to a unique bucket (mod 32),
Excellent research! Would you mind sharing the list of these
approximately 30 commonly used xattrs, so that I could check if they are
mapped to unique bucket with the single hash function we proposed?
--
Thanks,
Jingbo