Re: Things I wish I'd known about Inotify

From: Jan Kara
Date: Mon Apr 07 2014 - 05:32:08 EST


On Sun 06-04-14 11:00:29, Michael Kerrisk (man-pages) wrote:
> On 04/04/2014 02:43 PM, Jan Kara wrote:
> > On Fri 04-04-14 09:35:50, Michael Kerrisk (man-pages) wrote:
> >> On 04/03/2014 10:52 PM, Jan Kara wrote:
> >>> On Thu 03-04-14 08:34:44, Michael Kerrisk (man-pages) wrote:
>
> [...]
>
> >>>> Dealing with rename() events
> >>>> The IN_MOVED_FROM and IN_MOVED_TO events that are generated by
> >>>> rename(2) are usually available as consecutive events when readâ
> >>>> ing from the inotify file descriptor. However, this is not guarâ
> >>>> anteed. If multiple processes are triggering events for moniâ
> >>>> tored objects, then (on rare occasions) an arbitrary number of
> >>>> other events may appear between the IN_MOVED_FROM and IN_MOVED_TO
> >>>> events.
> >>>>
> >>>> Matching up the IN_MOVED_FROM and IN_MOVED_TO event pair generâ
> >>>> ated by rename(2) is thus inherently racy. (Don't forget that if
> >>>> an object is renamed outside of a monitored directory, there may
> >>>> not even be an IN_MOVED_TO event.) Heuristic approaches (e.g.,
> >>>> assume the events are always consecutive) can be used to ensure a
> >>>> match in most cases, but will inevitably miss some cases, causing
> >>>> the application to perceive the IN_MOVED_FROM and IN_MOVED_TO
> >>>> events as being unrelated. If watch descriptors are destroyed
> >>>> and re-created as a result, then those watch descriptors will be
> >>>> inconsistent with the watch descriptors in any pending events.
> >>>> (Re-creating the inotify file descriptor and rebuilding the cache
> >>>> may be useful to deal with this scenario.)
> >>> Well, but there's 'cookie' value meant exactly for matching up
> >>> IN_MOVED_FROM and IN_MOVED_TO events. And 'cookie' is guaranteed to be
> >>> unique at least within the inotify instance (in fact currently it is unique
> >>> within the whole system but I don't think we want to give that promise).
> >>
> >> Yes, that's already assumed by my discussion above (its described elsewhere
> >> in the page). But your comment makes me think I should add a few words to
> >> remind the reader of that fact. I'll do that.
> > Yes, that would be good.
> >
> >> But, the point is that even with the cookie, matching the events is
> >> nontrivial, since:
> >>
> >> * There may not even be an IN_MOVED_FROM event
> >> * There may be an arbitrary number of other events in between the
> >> IN_MOVED_FROM and the IN_MOVED_TO.
> >>
> >> Therefore, one has to use heuristic approaches such as "allow at least
> >> N millisconds" or "check the next N events" to see if there is an
> >> IN_MOVED_FROM that matches the IN_MOVED_TO. I can't see any way around
> >> that being inherently racy. (It's unfortunate that the kernel can't
> >> provide a guarantee that the two events are always consecutive, since
> >> that would simply user space's life considerably.)
>
> > Yeah, it's unpleasant but doing that would be quite costly/complex at the
> > kernel side.
>
> Yep, I imagined that was probably the reason.
I had a look into that code again and it's all designed around the fact
that there's a single inode to notify. If you liked to have atomic rename
notifications, you'd have to rewrite that to work with two inodes, finding
out whether these two inodes are actually watched by the same group or
not... Doable but complex. Alternatively you could just lock down the whole
notification subsystem while generating rename events. But that's rather
costly. Just that we have the complications written down somewhere in case
someone wants to look into this in future.

> > And the race would in the worst case lead to application
> > thinking there's been file moved outside of watched area & a file moved
> > somewhere else inside the watched area. So the application will have to
> > possibly inspect that file. That doesn't seem too bad.
>
> It's actually very bad. See the text above. The point is that one likely
> treatment on an IN_MOVED_FROM event that has no IN_MOVED_TO is to remove
> the watches for the moved out subtree. If it turns out that this really
> was just a rename(), then on the IN_MOVED_TO, the watches will be recreated
> *with different watch descriptors*, thus invalidating the watch descriptors
> in any queued but as yet unprocessed inotify events. See what I mean?
> That's quite painful for user space.
But if I understand it right, you loose only the information for recreated
watches. So you effectively loose all the information about what has
happened inside the subtree of moved directory (or what has happened with
the moved file). But since you think it's a file / dir moved from outside
of watched area, you have to fully rescan that file / dir anyway. Sure
that's costly but if your heuristics for detecting rename works 99.9% of
time it should be OK, shouldn't it? And you have to have that code handling
caching file / dir written anyway for handling real moves from outside of
watched hierarchy.

Don't get me wrong, I understand it would be easier for userspace to get
atomic rename notifications, I'm just trying to understand what exactly is
painful so that I can compare the cost at the kernel side with the cost at
the userspace side...

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/