Detecting changed files for JIT/metadata cacheing

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Sat, 26 Jun 1999 00:00:55 +0200


What I want to achieve
----------------------

I'd like to be able to know if a file has changed since the last time I
cached information about it, efficiently.

Why do I want this? To transparently cache results of operating on
files -- and still provide guarantees of correctness. In my case I want
to cache JIT-compiled code transparently.

There are other uses.

- Stuff like cacheing extracted metadata (to bring in a topical idea :-)
- Cached checksums -- for security checks.

Problem with existing mechanism
-------------------------------

I could just look at mtime & size but (a) mtime can be changed back (and
often is) and (b) it doesn't have infinite resolution. Thus no
guarantee of a correctness.

I could do a checksum but that's too slow. I'm considering JIT-style
transformation of big executables like Emacs & Netscape. Demand-page
checksumming is possible (for my particular app), but still rather
wasteful in time and memory.

Proposal
--------

So I'd like something different from mtime: a modification version
number that could not be reset, and is incremented whenever a modified
file's version is read (which resets the modification flag). (This is
to avoid incrementing on every modification which would overflow too
quickly).

And I'd like that to automatically flag parents modified. In normal
use, where you don't look at the versions, this has no overhead. This
is for a totally different application: speeding up large recursive
directory searches for newly changed files.

Implementing
------------

Ext2 already has a version field per inode, and room for a "modified"
flag. So it'd be a compatible extension and even the API already
exists. (Unless the version field is used in some way I don't
understand).

The version field can be set which IMO rather defeats the point; perhaps
that could be removed? I'd be on for allowing the version to be set
_once_ for a newly created file (ctime can be checked) -- or just
keeping it if we decide that you can always modify the filesystem using
debugfs anyway.

You may guess I don't care about other filesystems -- fall back to
slow checksumming would be fine ;-)

I'm happy to implement -- any interesting comments first?

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/