albods are not a clean set of orthogonal primitives (was Re: File systems are semantically impoveris

Hans Reiser (reiser@ceic.com)
Thu, 24 Jun 1999 15:25:09 +0000 (/etc/localtime)


I feel that you are not orthogonalizing the desired feature into a set
of fully independent orthogonal primitives.

Compression is one example of this. Compression is a fully orthogonal
issue. Why tie it to any other functionality?

Placing the filesystem into user libraries exo-kernel style is another
orthogonal issue. I don't want to start an exo-kernel implementation
right now, especially not without doing it completely and systematically
for all of the filesystem.

Transferring directories and virtual files can be solved in several
simple and effective ways, we should pick one, and systematically
implement it for all virtual files. I like the one in which when you
transfer files you access the a special view of the FS:
/filters-off-and-portable-format-only-visible/pathname/foo (or rather some
such equivalent short named thing) to do the transfer. Tar can use this
too. Someday we ought to use it for symlinks and file holes also (they
are just virtual files too.)

I think that there should be only one interface to the set primitive,
and that streams are hideous for at least the reason that they create a
second kind of set with a second kind of interface.

What I understand here is that you are proposing a second kind of set
with a second kind of interface, so that albod ignorant programs can
transfer an albod without having to understand it. The problem that
leaves unsolved is that albod ignorant programs cannot access its
components because they don't understand it, and that a second
interfaces is per se bad. I think you feel that albod ignorant programs
don't need to access it. Here I think you are wrong. Richard correctly
pointed out that there is no reason to change emacs to handle any of
this, it can all be done by the FS. He is right, so long as we keep
things clean.

you would have us make it feasible to ignorantly transfer a
directory/albod, at the cost of us not ignorantly using a
directory/albod. I think the use of views makes it feasible to get
both.

Finally, I don't see any need for the word application anywhere in the
name of any of these orthogonal primitives we are creating. What we are
creating has no need to be more application specific than a directory is
application specific. But then, maybe you are one of those folks who
thinks it is clear what is OS and what is application. I am not one.

In summary, each of the following has merit independent of the others:

overloading names so that directories and files can have the same names

filters to convert directories into various formats with rdf as one
default so that they may be edited as flat files, and tar as another so
that they may be transferred. Note that, as Richard pointed out, the
format you want to use for editing with emacs is probably not the format
you want to use for transferring. The format for editing is especially
not tar. With filters come bundled some convention for inserting
certain standard filters into the namespace. (dirname/..rdf, and
dirname/..tar, and dirname/..cat might be good. Note that dirname/..rdf
supports write, but dirname/..cat is a read-only file. Actually, if I
am the filter author, or somebody I pay is, then only dirname/..rdf
would have support for write, because it would be more work to do more,
and I am lazy/cheap. Writing dirname/..rdf would allow specifying a
non-random ordering of the elements in the directory, otherwise reading
dirname/..rdf defaults to whatever order is convenient to the FS).

file body inheritance

stat data inheritance

symlinks (already implemented)

Note that I am unsure whether filters should also be used to implement
file body inheritance, stat data inheritance, and symlinks, and feel
vulnerable to good arguments on the topic.

Together, they would allow all the functionality afforded by streams
that Jeremy needs.. Yet because they are kept fully orthogonal, they
will be useful for much more than applications asking for streams
functionality. Especially compression.

Now I think it is time to get this implemented. I am going to try to
hire somebody or somebodies for this this week. As it gets implemented
a lot of the details will fall into place, I don't really want to design
this more before I have a person assigned to do the work. If Acy does
filters, I'll focus my guys on the other features, otherwise we'll
implement filters too.

Hans

tytso@mit.edu writes:
>
> So, here's a quick back-of-the-envelope design for a completely
> user-space solution for folks who have been asking for multi-fork files.
> It's not intended to be a completely polished design, but I believe it's
> worth at least considering before rushing off and deciding that the only
> way to do things is to extend Linux's filesystem semantics.
>
> I write this up this because people have accused me of just being a
> conservative "Dr. No" who always thinks their great new ideas are always
> bad. On the contrary, if application writers (especially office suite
> application writers) are demanding certain sets of functionality, we
> should take such requests seriously, and weigh the costs and benefits of
> what they ask for. It's just that I very strongly believe in trying to
> offer a user-space solution first before resorting to making in-kernel
> solutions. Especially if they are hacks that will only work on Linux
> systems! (Using one's OS market share as a club against
> interoperability is a despicable Microsoft tactic, and not one I want to
> encourage.)
>
>
> Requirements analysis
> =====================
>
> So, let's try this as an exercise. Since no one has actually bothered
> to write down a list of requirements before galloping off to a solution,
> let me try to offer some:
>
> 1) "Common" file manipulations operations should treat an "application
> logical bundle of data" (albod) as if it were a single file. (Forgive
> me for inventing a new acronym here, but "application logical bundle of
> data" is too long to type each time, and I don't want to bias people's
> thinking about how it is actually implemented.)
>
> 2) Applications should be able to quickly and efficiently manipulate
> (read, modify, replace, delete, etc.) individual streams of data within
> an albod. This should be done without the file bloat and inefficiencies
> found in MS Office 97 format files.
>
> 3) There should be standard file streams inside the albod whose
> semantics and data format are standardized, so that programs such as
> graphical file managers can determine basic information about an albod,
> such as which icon to use, who created it, which application should be
> invoked when the albod is activated, etc. quickly and easily. (Using
> file(1) on a data file to determine which application can interpret it
> is considered barbaric.)
>
> 4) It should be easy to send these albod's across standard Internet
> protocols using standard, commonly available tools (ftp, http, rcp, scp,
> etc.).
>
> Am I missing any other requirements?
>
>
> Other solutions
> ===============
>
> Now then, which approaches have been used to address this problem in the
> past? In the NTFS and the Macintosh, this was done by adding
> specialized (but non-standard) semantics and new formats in the
> filesystem. This satisfied the first three requirements, but failed on
> the last.
>
> The NeXT used a directory containing individual files, which satisfied
> requirements #2 and #3, but didn't satisfy #1 (except if you only used
> their graphical file manager) and #4 (unless you explicitly tar'ed stuff
> up first).
>
> My proposed straw-man proposal
> ==============================
>
> I now offer to you a design for a potential solution which is purely
> implemented in userspace, and has the advantage that it will work across
> all existing filesystems, include NFS, AFS, Coda, ext2, and doesn't
> require any linux-specific kernel hacks (which is important, since last
> time I checked, the GNOME and KDE folks weren't interested in solutions
> that only worked on Linux). The solution is a directory-based solution,
> like NeXT, but tries to address the rest of the requirements.
>
> First of all, we need some way of distinguishing an "albod" from a
> normal directory. This can either be done using a filesystem specific
> flag, which is probably more efficient, but we would also like a
> filesystem independent way of doing this. So instead of (or perhaps in
> addition to) using a filesystem-provided flag, let's posit a magic
> dotfile in the directory which, if present, marks it has an albod
> bundle.
>
> Now let's assume that we have a hacked libc (or a system-wide
> LD_PRELOAD) which intercepts the open system call. If an application
> does not declare itself (via some API call) to be albod knowledgeable,
> an attempt to open and read the albod results in the user-mode library
> emulation of open()/read() to return a tar-file-like flat-file
> representation of the albod. This allows cp, ftp, httpd, mimeencode,
> etc. to be able to treat an albod as if it were a single "bag of bits".
>
> If the application declares itself to be albod-aware, it can then treat
> the albod as a directory hierarchy, and manipulate the various
> subcomponents of the albod as named streams, just like NTFS5 allows ---
> except that we can have hierarchical named streams, and not just a flat
> namespace!
>
> How are albod's written? Well, an albod-aware application simply writes
> the appropriate component directories and files as if they were normal
> Unix files (which in fact, they are). If an non-albod-aware application
> such as /bin/cp writes it, there are two design choices. It's not clear
> which one is better, so let me outline both of them. One is to have the
> user-mode library notice that it is a albod flat-file representation by
> looking at its header, and then automatically unpacking it into its
> directory format as it is writing it out.
>
> The other design choice is to simply allow the albod to be written out
> as a flat file, and when an albod-aware application tries to modify it,
> only then does the albod-flat-file-package get exploded into its
> directory-based form. If the flat-file format is compressed (which
> would be a great idea since applications would now get compression for
> free) then only expanding an albod when it is necessary to read it will
> save disk space for albod's which are only getting access occasionally
> in read-only fashion.
>
>
> Problems with this approach
> ===========================
>
> What are the downsides of this approach? Since by default, a
> non-albod-aware application gets the entire packaged albod as a single
> flat-byte-stream representation, /bin/cp, etc. work fine. This is great
> if the albod contains some new application data format, such as a Word
> or an Excel or a Powerpoint competitor, since the actual application
> code which manipulates the application document is albod-aware.
>
> However, if the albod contains a .gif, .mp3, etc. file, where the
> already-existing applications that know how to process the .gif or .mp3
> file aren't albod-aware (think: xv), then having open() return a
> flat-file contents of the entire albod is the wrong behavior. Instead,
> you want to return the default data-fork contents in that case. So what
> we can do is to have a second magic .dotfile or flag which indicates
> that for this albod, when it is opened and read, the default data file
> should be returned instead of a flat-file representation of the albod.
> The tradeoff for using this optional mode is that a naive /bin/cp or
> Midnight Commander program which doesn't know about albod files won't
> know how to copy or move the entire albod. So an attempt to ftp or mail
> this alternate form of the albod will just result in the data fork being
> sent. But if all of the application-specific data (i.e., the .gif or
> the .au data) is in the default data fork, losing the other metadata
> format might not be a disaster, and so this might be the approprach
> tradeoff. It depends on what extra metadata extensions GNOME or KDE
> wants to store in the albod alongside the .gif or .mp3 data.
>
>
> The other downside with this solution is that it is admittedly pretty
> complex, and there are some subtle issues about how the LD_PRELOAD or
> hacked libc routines should actually work in practice. Some might even
> say that it is a kludge.
>
> On the other hand, is it really that much worse than having kernel-mode
> "reparse points" that manipulaes application specific data in the
> kernel?!? I would argue that in contrast, having user-mode library
> hacks may actually cleaner, although admittedly both solutions aren't
> exactly pretty. Perhaps someone can come up with a yet more cleaner
> solution. I hope so!!
>
>
> Summary
> =======
>
> This is obviously not a fully fleshed out design proposal. There are
> obviously lots and lots of details that would need to be filled in
> first, before this could be used as a set of functional specs which an
> implementor could implement. I won't even claim that this is the best
> way to meet the stated requirements solely in user space. Someone may
> come up with a more clever user-space-only solution.
>
> Rather, this was intended to serve as some food for thought, and a proof
> by example there is a way to do this in user-mode, without requiring
> Linux-specific filesystem hacks and extensions. While it requires some
> extra extensions to the libc, which might be considered kludgy, I
> believe it is no worse than the Microsoft NTFS-style "reparse points"
> suggestion which was offered to the kernel list in the last day or so.
>
> - Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/