Re: Another Way

Greg Lindahl (lindahl@cs.virginia.edu)
Mon, 5 Jul 1999 16:10:39 -0400 (EDT)


> The usual complaint (pretty much the only one) is one of efficiency.

I don't think that efficiency is big enough complaint to put compound
files in the kernel, for reasons I'll explain:

> I'm sure you've noticed people complain that Word's "Fast Save"
> generates enormous files -- mostly unused data.

Yes. But that's implementation stupidity. Word "fast saves" changes as
deltas from the original document, which leaves all the deleted data
and obselete deltas lying around. Emacs, on the other hand, uses
buffer gaps and always rewrites the whole file. Emacs is often slower
when saving, but it generates small save files and is faster when
doing elementary operations like searching, because searching a linear
file is much easier than searching a file composed of a bunch of deltas.

So there's the same problem in an historical perspective: emacs vs. vi
;-) (although vi is smarter than Word, too.)

> And there's fragmentation, if you try to keep the file roughly as large
> as the data (no holes) without rewriting it each time.

Is rewriting necessarily bad? Personally, I would rather rewrite the
file than put compound files in the kernel. Disks are fast, and most
of my documents are only a few megabytes in size.

> Which brings us to a new idea.
>
> Imagine a user space library for components in a file. It's a bit like
> a filesystem or database. I bet Bonobo & OpenParts have one already.

Microsoft has one, too ;-) ;-) There's a lot of state-of-the-art
examples for you to look at (Amiga, Mac, etc etc)

Moving on to kernel support that might be useful for many projects:

> A few ideas tossed into the air:
>
> fedit (insert/delete/swap byte ranges, appropriate fall backs
> e.g. limited to multiples of block size, fail on some filesystems),

This would also be useful for databases. Databases (e.g. Oracle and
Sybase) have the same fragmentation problem you mention above. Part of
the usual "tune up" procedure for a database which has had a lot of
records inserted over a long period of time is to recopy all the data
and indices. If a database could insert blocks in the middle of the
database and hopefully have these blocks near its neighboring blocks,
life would be much better.

-- greg

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/