Re: Things I wish I'd known about Inotify
From: Michael Kerrisk (man-pages)
Date: Fri Apr 04 2014 - 04:00:12 EST
[CC += Al Viro & Linux, since they also discussed the point about
remote filesystems and /proc and /sys here:
On 04/03/2014 05:38 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>> (To: == [the set of people I believe know a lot about inotify])
>> Hello all,
>> Lately, I've been studying the inotify API fairly thoroughly and
>> realized that there's a very big gap between knowing what the system
>> calls do versus using them to reliably and efficiently monitor the
>> state of a set of filesystem objects.
>> With that in mind, I've drafted some substantial additions to the
>> inotify(7) man page. I would be very happy if folk on the "To:" list
>> could comment on the text below, since I believe you all have a lot of
>> practical experience with Inotify. (Of course, I also welcome comments
>> from anyone else.) In particular, I would like comments on the
>> accuracy of the various technical points (especially those relating to
>> matching up related IN_MOVED_FROM and IN_MOVED_TO events), as well as
>> pointers on any other pitfalls that the programmers should be wary of
>> that should be added to the page.
> Other pitfalls.
> Inotify only report events that a user space program triggers through
> the filesystem API. Which means inotify is limited for remote
> filesystems, and filesystems like proc and sys have no monitorable
Good point. I recently got CCed on that very point, but hadn't
added it to the page. I've added it now.
Revised text below, after incorporating changes from your comments and those
of Jan Kara.
Limitations and caveats
The inotify API provides no information about the user or process
that triggered the inotify event. In particular, there is no
easy way for a process that is monitoring events via inotify to
distinguish events that it triggers itself from those that are
triggered by other processes.
Inotify reports only events that a user-space program triggers
through the filesystem API. As a result, it does not catch
remote events that occur on network filesystems. (Applications
must fall back to polling the filesystem to catch such events.)
Furthermore, various virtual filesystems such as /proc, /sys, and
/dev/pts are not monitorable with inotify.
The inotify API identifies affected files by filename. However,
by the time an application processes an inotify event, the fileâ
name may already have been deleted or renamed.
The inotify API identifies events via watch descriptors. It is
the application's responsibility to cache a mapping (if one is
needed) between watch descriptors and pathnames. Be aware that
directory renamings may affect multiple cached pathnames.
Inotify monitoring of directories is not recursive: to monitor
subdirectories under a directory, additional watches must be creâ
ated. This can take a significant amount time for large direcâ
If monitoring an entire directory subtree, and a new subdirectory
is created in that tree or an existing directory is renamed into
that tree, be aware that by the time you create a watch for the
new subdirectory, new files (and subdirectories) may already
exist inside the subdirectory. Therefore, you may want to scan
the contents of the subdirectory immediately after adding the
watch (and, if desired, recursively add watches for any subdirecâ
tories that it contains).
Note that the event queue can overflow. In this case, events are
lost. Robust applications should handle the possibility of lost
events gracefully. For example, it may be necessary to rebuild
part or all of the application cache. (One simple, but possibly
expensive, approach is to close the inotify file descriptor,
empty the cache, create a new inotify file descriptor, and then
re-create watches and cache entries for the objects to be moniâ
Dealing with rename() events
As noted above, the IN_MOVED_FROM and IN_MOVED_TO event pair that
is generated by rename(2) can be matched up via their shared
cookie value. However, the task of matching has some challenges.
These two events are usually consecutive in the event stream
available when reading from the inotify file descriptor. Howâ
ever, this is not guaranteed. If multiple processes are triggerâ
ing events for monitored objects, then (on rare occasions) an
arbitrary number of other events may appear between the
IN_MOVED_FROM and IN_MOVED_TO events.
Matching up the IN_MOVED_FROM and IN_MOVED_TO event pair generâ
ated by rename(2) is thus inherently racy. (Don't forget that if
an object is renamed outside of a monitored directory, there may
not even be an IN_MOVED_TO event.) Heuristic approaches (e.g.,
assume the events are always consecutive) can be used to ensure a
match in most cases, but will inevitably miss some cases, causing
the application to perceive the IN_MOVED_FROM and IN_MOVED_TO
events as being unrelated. If watch descriptors are destroyed
and re-created as a result, then those watch descriptors will be
inconsistent with the watch descriptors in any pending events.
(Re-creating the inotify file descriptor and rebuilding the cache
may be useful to deal with this scenario.)
Applications should also allow for the possibility that the
IN_MOVED_FROM event was the last event that could fit in the bufâ
fer returned by the current call to read(2), and the accompanying
IN_MOVED_TO event might be fetched only on the next read(2).
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/