Re: Finding hardlinks

From: Miklos Szeredi
Date: Mon Jan 08 2007 - 03:51:26 EST


> >> No one guarantees you sane result of tar or cp -a while changing the tree.
> >> I don't see how is_samefile() could make it worse.
> >
> > There are several cases where changing the tree doesn't affect the
> > correctness of the tar or cp -a result. In some of these cases using
> > samefile() instead of st_ino _will_ result in a corrupted result.
>
> ... and those are what?

- /a/p/x and /a/q/x are links to the same file

- /b/y and /a/q/y are links to the same file

- tar is running on /a

- meanwhile the following commands are executed:

mv /a/p/x /b/x
mv /b/y /a/p/x

With st_ino checking you'll get a perfectly consistent archive,
regardless of the timing. With samefile() you could get an archive
where the data in /a/q/y is not stored, instead it will contain the
data of /a/q/x.

Note, this is far nastier than the "normal" corruption you usually get
with changing the tree under tar, the file is not just duplicated or
missing, it becomes a completely different file, even though it hasn't
been touched at all during the archiving.

The basic problem with samefile() is that it can only compare files at
a single snapshot in time, and cannot take into account any changes in
the tree (unless keeping files open, which is impractical).

There's really no point trying to push for such an inferior interface
when the problems which samefile is trying to address are purely
theoretical.

Currently linux is living with 32bit st_ino because of legacy apps,
and people are not constantly agonizing about it. Fixing the
EOVERFLOW problem will enable filesystems to slowly move towards 64bit
st_ino, which should be more than enough.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/