Needed: kernel API for requesting/caching of file hashes

From: Uecker, Martin
Date: Sun Dec 08 2019 - 13:40:23 EST



Hi all,

here is an proposal for a new API. Unfortunately,
I never found time to work on a kernel implementation,
so I thought I just throw it out. In my opinion,
it could be very useful for a variety of applications.


The new API would allow to request the hash of a
file from the kernel with the kernel giving a
guarantee that the hash is up-to-date and
correct.

Requesting a hash the first time would
trigger the computation of the hash. If the file
is modified during the computation, this process
is aborted and an error is returned (and maybe
already when a writable mapping is created).
The hash is then cached by the kernel and/orÂ
the file system in a protected attribute (i.e.
not modifiable by non-privileged users).
If the hash is requested again, the cached hash
is returned but only if the file hasn't changed in
the meantime. There need to be options to return
the old stale hash and also to refresh the hash.

Of course, hashes can also be computed and
stored in attributes from a user-space
program, and here is my proof-of-principle
implementation of this idea:

https://github.com/uecker/hashcache


..but moving something like this into the
kernel would solve two problems:

1) it would make it possible to reliable detect
file changes

2) it would make it possible to share the
(expensive to compute) hashes between different
applications / users even if they do not
trust each other


Users of such an API could be version control
systems (it is similar to the git index moved
to the kernel, I think), rsync, tools for backup,
de-duplication, and file sharing, integrity
checkers, etc.


The implementation could probably be based on IMA
(although it has a different goal) and could
reuse hashes from file systems (or fs-verity)
where they are available.


Regards,
Martin