Re: [PATCH] cowlinks v2

From: Jamie Lokier
Date: Mon Apr 05 2004 - 07:37:05 EST


jlnance@xxxxxxxxxxxxxx wrote:
> Perhaps diff would run faster but that
> seems like a very special case thing, and diff will certainly work w/o it.

We are talking about a difference between 20 minutes and 1 second.
It's quite significant, when it's a regular part of your diffing &
patching day.

I agree with your general sentiment that we shouldn't expose
filesystem details, e.g. a 32-bit integer. See below for an
alternative interface.

> Tar might also be faster creating archives if it had this information
> available. However to make tar useful wrt cowlinks, it will need to be
> able to create these links at extract time from tarfiles which were created
> on non-cowlink filesystems, so I don't think there is a pressing need.

I agree. The purpose of cowlinks is to be semantically invisible. If tar
or some other archiver/transferer wanted to use this information, it
should really be checking for equivalent files in general (like cmp)
and use this call as an optimisation only.

Btw, when we treat cowlinks as a semantically invisible, there is no
problem searching an entire filesystem for files with identical
content and linking them together to save space, in a cron job. It's
invisible to applications, except that space is saved and sometimes
the first write takes longer.

That still permits the get_data_id() optimisation, but that now
strictly means "kernel knows and returns a unique id of the data
(unique in this filesystem)".

Instead of get_data_id(), we'd use a POSIX attribute called "data-id"
returned by getxattr(). An absence of the attribute indicates that no
data-id is known. Otherwise, it's a unique id for that data in the
current filesystem.

It's a short byte string (another reason for making it a POSIX
attribute). On ext2/ext3, it's just the bytes of the shared inode
number plus a filesystem-wide generation number. On a hypothetical
httpfs, it could be the host name and ETag (a strong validator). On
any filesystem, it could be the SHA1 digest if that is known. It
would have the nice property of working over NFSv4, too.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/