Re: Beyond inotify recursive watches

From: Jan Kara
Date: Wed Apr 10 2013 - 16:40:35 EST


On Thu 11-04-13 00:06:02, Ramkumar Ramachandra wrote:
> [Dropping git people from the CC, as this is not relevant to git anymore]
>
> Okay, let me attempt to understand this.
>
> Jan Kara wrote:
> > On Fri 05-04-13 17:12:29, Al Viro wrote:
> >> On Fri, Apr 05, 2013 at 05:55:34PM +0200, Jan Kara wrote:
> >>
> >> > What your question reminds me is an idea of recursive modification time
> >> > stamp on directories. That is a time stamp that gets updated whenever
> >> > anything in the tree under the directory changes. Now this would be too
> >> > expensive to maintain so there's also a trick implemented that you update
> >> > the time stamp (and continue updating recursive time stamps upwards) only
> >> > if a special flag is set on the directory. And you clear the flag at that
> >> > moment. So until someone checks the time stamp and resets the flag no
> >> > further updates of the recursive modification time happen.
>
> If I understand correctly, I'll have to set a flag on the toplevel
> directory of the repository, and this recursive timestamp update magic
> will apply to my entire worktree. How exactly? Are you going to
> store that path somewhere?
Initially, you will have to flip the flag on every directory in the
subtree. But the flag is persistently stored on disk so you have to do it
once when the directory is created and then each time you notice the
directory has changed and the flag has been cleared.

> Whenever there's any modification on the
> filesystem, you can look at the path of the inode you're modifying,
> and see if it's under this path. If it is, we'll have to keep update
> the container dentry's timestamp, and continue recursively until we
> hit the toplevel dentry. On the toplevel dentry, you'll flip a flag
> in addition to modifying the timestamp.
>
> Later, I'll have to check if the timestamp changed from what I have
> remembered in git. If there is a change, I'll look through the
> timestamp of every dentry downwards until I find the modified inode:
> certainly much fewer fs calls. After updating the git index with
> fresh information, I'll have to flip the flag on the toplevel
> directory again.
Yes, that's the intended use. And yes, it has a potential for significant
speedup if modifications are relatively rare / concentrated in a few places
in the directory tree even as is. Just it would be more useful if changed
entries in a directory could be located quickly.

> >> > This scheme works for arbitrary number of processes interested in recursive
> >> > time stamps (only updates of the time stamps get more frequent). What is
> >> > somewhat inconvenient is that this only tells you something in the
> >> > directory or its subtree changed so you still have to scan all the
> >> > directories on the path to modified file. So I'm not sure of how much use
> >> > this would be to you.
>
> I think it's a very useful feature to have in general, not just for
> git or version control systems.
>
> >> Feel free to write up the details of locking you'll need for that. It will
> >> *not* be fun...
>
> Is this what you mean: What happens if two inodes under the toplevel
> directory change nearly simultaneously? The two propagation threads
> will conflict.
I think Al is asking how do we lock kernel dentry cache so that we can
safely walk up the tree and update time stamps in presence of other
modifications happening to the directory tree in parallel.

> > Actually, it shouldn't be too bad if we don't guarantee we walk exactly
> > the path used for modification. Then it is enough to do the same thing as
> > following .. from each directory.
>
> I have no idea what this means.
>
> > And for userspace that should be enough because if timestamp update races
> > with renames or similar actions somewhere up in the three then these
> > operations will generate modification events and update time stamps as
> > well. So userspace will notice there was a change.
>
> Do you mean: as long as updating the timestamp is atomic, it doesn't
> matter than many threads race to update it (it is guaranteed that
> every thread does a successful update)?
It's not as much timestamp updates themselves racing against each other
but rather things like moving directories in the directory tree racing with
us walking up the tree and updating time stamps - in Linux, directory
locking happens in top-bottom manner (like when you do lookup of a path) so
when you are climbing up, one has to be careful not to introduce races.

> > So this part should be doable. But as I wrote before, we might need some
> > fs-internal index to allow efficient tracking of what has changed in one
> > directory anyway and locking rules / costs for that are non-obvious.
>
> Why does it have to be fs-internal, and not at the VFS layer?
One reason why we need things to be fs-internal is that we want to store
everything permanently on disk so that e.g. if there's reboot between
modification of a git tree and 'git add -u', you will still find what has
changed since last time you've checked (without walking the whole tree).

> I don't
> know what VFS looks like, but of the little I know about btrfs:
> There's one global B+ tree where dentry paths are keyed by their CRC32
> hashes. The dentry contains many inodes, and you're worried about
> efficiently tracking which inodes have changed. Why does there have
> to be an efficiency concern there? I suppose multiple inodes'
> timestamp changing simultaneously can spawn threads that race to
> update the dentry's timestamp. Why is this challenge different from
> the recursive propagation challenge?
My concern is that if you have a directory tree where there are lots of
entries in each directory, then you still have to check a lot of entries
before you find what has changed because you have to scan all entries in
each directory on the modified path. If there was a way to iterate only
through entries in a directory which had the flag cleared, things could be
considerably faster.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/