Re: Extensions to HFS filesystem

Paul H. Hargrove (hargrove@sccm.stanford.edu)
Thu, 23 May 1996 05:11:03 -0700


csmith@stoneboro.uucp.cirr.com wrote:
> In article <318D6B29.167EB0E7@sccm.stanford.edu>
> "Paul H. Hargrove" <hargrove@sccm.stanford.edu> writes:
> There is only one problem that I see remaining and that is the task of
> keeping an accurate count of the number of links to these files. Counting
> links that are created and destroyed under Linux is no problem. However
> if the last is deleted under MacOS then the file continues to use disk
> space until some sort of garbage collection is done (in user space I
> hope). The fact is, however, that by representing a hard link as a
> pseudo-symlink to a hidden file means that a copy of the link under MacOS
> would create an extra link. This means that the kernel can't know when it
> is deleting the last link unless the kernel actually does the garbage
> collection (i.e. link counting) when the fs is mounted. I really don't
> want to do the counting in the kernel.
>
> I think it would work to have the link info contain a self pointer, so
> that it records something like the CNID of both the link (pointer) and
> the hidden file (pointee).
>
> I don't know if that makes sense or not since I don't know anything about
> HFS but what I mean is that the symbolic link that implements the hard
> link can be of the form "12345->/hfs/.hidden/actualfile" where it gives
> both the "inode number" of the link and the name (or whatever) of the
> real file. Copying such a link under MacOS produces a copy, but then
> Linux can know not to believe it since the "inode number" of the copy
> will not match the "inode number" recorded in the copied link.

This makes sense and allows a Mac user to move a hard link within a
filesystem and get the correct results. MacOS genereated copies are
easily turned into symbolic links if such a behavior is desired, or
ignored. Something I had not considered before is what happens if a
"hard link" file is copied to a different filesystem. Without any sort
of ID of its original location it could coincide with a file on the
new filesystem. With the addition of the Mac volume name to yield something
like "Macintosh HD:12345->/hfs/.hidden/actualfile", this problem goes away
as well in many cases (except that far too many Mac disks all have this
particular name!) However, the use of an inode database also solves all
these problems.

> One option I may explore is have fsck.hfs do all the link counting and
> any other post-MacOS-use cleanups, and extend the definition of "dirty"
> to include use of the fs by MacOS since the last use by Linux. Such a
> fsck would remove ALL the links if the luser was dumb enough to delete
> the hidden file under MacOS.
>
> Alternatively, instead of storing a link count (with the hidden file)
> you could store a list of the links, using their "inode numbers".
> When a hard link is deleted you have to examine the list to see how
> many other links remain valid. With the back pointers, when the list
> gets down to one link you can turn it back into a regular file.
>
> It's only necessary to validate the link list once per boot, you could
> store a boot sequence number with the list to remember which ones have
> been checked already. Important for, say, expiring netnews.
>
> This hack subsumes the previous hack -- the links don't have to contain
> their addresses if the list does.

The idea is sound except that under HFS there is no way to locate a file
by its CNID (the equivalent of an inode number). This means that the only
way to do the checking you describe is by brute force: checking every file
on the disk to see if it is the "pointer" file. If this is to be done once
for every boot, the work might as well be done by fsck.hfs which should be
examining every single file anyway.

> Under my original proposal every hardlink, symlink and device would use
> the HFS equivalent of one "cluster", which in the worst case is 1/32767
> of the disk and 1/65535 in the best case. If you have a 1Gb HFS fs then
> devices and links would take 16k each, as under UMSDOS. I don't like this
> idea very much so I've been thinking of an alternative. If all the info
> such as device numbers and "link targets" were stored in a single database
> then links and devices would be empty files and would only use the disk
> space required for their directory entries and the space in the database.
>
> Sounds more robust to me. But it's a whole inner filesystem to implement,
> a lot more work. Then again, requiring a fsck every time you boot back and
> forth is well worth avoiding.

The work is not all that bad at all. The B*-tree manipulation code that is
part of my HFS implementation does all the organizational work for me since
it is a simple database manager. The only thing that really happens is that
reading the inode for a file or directory becomes a 2-step process:
1) read its catalog entry as I do now, which yields its CNID
2) read the extra inode info from the database, which is keyed by CNID
Writing an inode to disk updates information in both locations. A file that
doesn't have the extra info gets the same defaults that it would have if it
were read using the "plain" Linux HFS filesystem rather than the "extended" one.
The extra info would only be created if these parts of the inode changed from
these defaults.

-- 
Paul H. Hargrove                   All material not otherwise attributed
hargrove@sccm.stanford.edu         is the opinion of the author or a typo.