Re: [PATCH] inotify: Increase default inotify.max_user_watches limit to 1048576

From: Amir Goldstein
Date: Tue Oct 27 2020 - 04:20:19 EST


On Mon, Oct 26, 2020 at 10:44 PM Waiman Long <longman@xxxxxxxxxx> wrote:
>
> The default value of inotify.max_user_watches sysctl parameter was set
> to 8192 since the introduction of the inotify feature in 2005 by
> commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too
> small for many modern usage. As a result, users have to explicitly set
> it to a larger value to make it work.
>
> After some searching around the web, these are the
> inotify.max_user_watches values used by some projects:
> - vscode: 524288
> - dropbox support: 100000
> - users on stackexchange: 12228
> - lsyncd user: 2000000
> - code42 support: 1048576
> - monodevelop: 16384
> - tectonic: 524288
> - openshift origin: 65536
>
> Each watch point adds an inotify_inode_mark structure to an inode to be
> watched. Modeled after the epoll.max_user_watches behavior to adjust the
> default value according to the amount of addressable memory available,
> make inotify.max_user_watches behave in a similar way to make it use
> no more than 1% of addressable memory within the range [8192, 1048576].
>
> For 64-bit archs, inotify_inode_mark should have a size of 80 bytes. That
> means a system with 8GB or more memory will have the maximum value of
> 1048576 for inotify.max_user_watches. This default should be big enough
> for most of the use cases.
>

Alas, the memory usage contributed by inotify watches is dominated by the
directory inodes that they pin to cache.

In effect, this change increases the ability of a given user to use:

1048576(max_user_watches)*~1024(fs inode size) = ~1GB

Surely, inotify watches are not the only way to pin inodes to cache, but
other ways are also resource controlled, for example:
<noproc hardlimit>*<nofile hardlimit>

I did not survey distros for hard limits of noproc and nofile.
On my Ubuntu it's pretty high (63183*1048576). I suppose other distros
may have a lower hard limit by default.

But in any case, open files resource usage has high visibility (via procfs)
and sysadmins and tools are aware of it.

I am afraid this may not be the case with inotify watches. They are also visible
via the inotify fdinfo procfs files, but less people and tools know about them.

In the end, it's a policy decision, but if you want to claim that your change
will not use more than 1% of addressable memory, it might be better to
use 2*sizeof(struct inode) as a closer approximation of the resource usage.

I believe this conservative estimation will result in a default that covers the
needs of most of the common use cases. Also, in general, a system with
a larger filesystem is likely to have more RAM for caching files anyway.

An anecdote: I started developing the fanotify filesystem watch as replacement
to inotify (merged in v5.9) for a system that needs to watch many millions of
directories and pinning all inodes to cache was not an option.

Thanks,
Amir.