Re: [RFC PATCH v2 1/2] kernfs: use kernfs_node specific mutex and spinlock.

From: Imran Khan
Date: Thu Jan 13 2022 - 05:51:57 EST


Hello,

On 13/1/22 7:48 pm, Greg KH wrote:
> On Wed, Jan 12, 2022 at 10:08:55AM -1000, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Jan 11, 2022 at 10:42:31AM +1100, Imran Khan wrote:
>>> The database application has a health monitoring component which
>>> regularly collects stats from sysfs. With small number of databases this
>>> was not an issue but recently several customers did some consolidation
>>> and ended up having hundreds of databases, all running on the same
>>> server and in those setups the contention became more and more evident.
>>> As more and more customers are consolidating we have started to get more
>>> occurences of this issue and in this case it all depends on number of
>>> running databases on the server.
>>>
>>> I will have to reach out to application team to get a list of all sysfs
>>> files being accessed but one of them is
>>> "/sys/class/infiniband/<device>/ports/<port number>/gids/<gid index>".
>>
>> I can imagine a similar scenario w/ cgroups with heavy stacking - each
>> application fetches its own stat regularly which isn't a problem in
>> isolation but once you put thousands of them on a machine, the shared lock
>> can get pretty hot, and the cgroup scenario probably is more convincing in
>> that they'd be hitting different files but the lock gets hot it is shared
>> across them.
>>
>> Greg, I think the call for better scalability for read operations is
>> reasonably justified especially for heavy workload stacking which is a valid
>> use case and likely to become more prevalent. Given the requirements, hashed
>> locking seems like the best solution here. It doesn't cause noticeable space
>> overhead and is pretty easy to scale. What do you think?
>
> I have no objection to changes that remove the lock contention, as long
> as they do not add huge additional memory requirements, like the
> original submission here did. If using hashed locks is the solution,
> wonderful!
>

Thanks Tejun and Greg, for suggestions and reviews so far.
I have sent v3 of this change at [1]. v3 uses hashed locks and overall
increase in memory footprint will be around 100K at most (with maximum
size of hash table). I will await your feedback regarding this.

[1]:
https://lore.kernel.org/lkml/20220113104259.1584491-1-imran.f.khan@xxxxxxxxxx/

Thanks,
-- Imran