Beyond inotify recursive watches

From: Ramkumar Ramachandra
Date: Mon Mar 18 2013 - 06:48:38 EST


Hi,

We, the Git folks, were wondering how to speed things up. In an
strace of "git status" on linux-2.6.git, we found:

top syscalls sorted top syscalls sorted
by acc. time by number
----------------------------------------------
0.401906 40950 lstat 0.401906 40950 lstat
0.190484 5343 getdents 0.150055 5374 open
0.150055 5374 open 0.190484 5343 getdents
0.074843 2806 close 0.074843 2806 close
0.003216 157 read 0.003216 157 read

Most of this happens when we try to build the index, querying for
changes in tracked files and discovering untracked files. It was
suggested that we can use inotify to speed things up: we'll write a
user-wide daemon (like ssh_client) that will set up watches on each
directory of each git repository. A repository-wide daemon wouldn't
work because /proc/sys/fs/inotify/max_user_instances reads 128 on
typical linux-3.8 systems, and this is problematic.

However, Karsten and Junio point out that our efforts might be futile
as we are trying to do what the VFS caching already does, and doing it
poorly. Speedups, if any, would be minor and certainly not worth the
effort.

I think inotify is a poorly suited solution for our needs, as setting
up recursive watches is horribly inelegant. I think it's a
well-suited solution for something like Dropbox, which just executes
something when there's a change in a specified directory. Also, I
suspect VFS caching works by optimizing filesystem calls for
frequently used directory entries. A git repository is not a
collection of frequently-used directory entries, but a frequently used
unit. I know very little about how VFS works, but I'm wondering if we
can make any changes in VFS to make it perform better with git
repositories. We won't need something as fine-grained as inotify: if
the tree hash of a directory entry changes frequently enough, optimize
all filesystem calls to inodes in the directory recursively.
Recursively optimizing a directory is useless in the general case, and
I would imagine something like a new rwatch() syscall for git to
register the repository with VFS. All system calls will then be
magically optimized, and few changes need to be made to git. The
added side-benefit is that all other version control systems can use
it too.

Thanks for reading.

Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/