At 8:09 PM -0500 8/29/2000, Roman Zippel wrote:
>So lets get back to the vfs interface
Yes, let's do that.
Every time I hear someone talking about implementing a filesystem, the
words "you are doomed" are usually to be heard somewhere along the lines.
Now, the bits on disk aren't usually the part that kills you - heck, I
repaired an HFS drive with a hex editor once (don't try that at home, kids)
- it's the evil and miserable FS driver APIs that get you. Big ugly
structs, coherency problems with layers upon layers of xyz-cache, locking
nightmares etc.
So, when my boss dropped a multiple-compressed-backed ramdisk filesystem in
my lap and said "make it use less memory", the words "I am doomed" floated
through my head.
Thankfully for the sake of both myself and my sanity, the platform of
choice was QNX 4.
(Obligitory disclaimer: QNX is an embedded operating system, both it's
architecture and target market is considerably different from Linux's)
QNX's filesystem interfaces make it so painfully easy to write a filesystem
that it puts everything else to shame. You can easily write a fully
functioning, race-free, completely coherent filesystem in less than a week,
it's that simple.
When I wanted to make my compressed-backed ramdisk filesystem attach to
multiple points in the namespace with seperate and multiple backings on
each point, in only a single instance of the driver, it was as easy as
changing 10 lines of code.
Now, for those of you who don't have convinient access to QNX4 or QNX
Neutrino (which has an even nicer interface, mostly cleaning up on the QNX4
stuff), here's the disneyfied version of how it all works:
When your filesystem starts up it tells the FS api "hey you, fs api. if
someone needs something under directory FOO, call me". Your filesystem then
wanders off and sleeps in the background 'till someone needs it.
Now, let's say you do an 'ls' on the FOO directory. The FS api would tap
your filesystem on the shoulder and ask "Hey you, what's in the FOO
directory?". Your filesystem would reply "BAR and BAZ".
Now you do 'cat FOO/BAZ >/dev/null', the FS api taps your filesystem on the
shoulder and says "someone wants to open FOO/BAZ". Your filesystem replys
"Yeah, got it open, here's an FD for you". The FS layer then comes back
again and says "I'll take block x y and z from the file on this FD", to
which your filesystem replies "Ok, here it is".
Etc etc, you get the point.
So what does it all mean? Basically, if you want hugely complex dentries,
and inodes as big as your head, you can do that. If you don't, more power
to you. It's all entirely contained inside your specific FS code, the FS
api doesn't care one bit. It just asks you for files.
It also means that you can do cute things like use the exact same API for
block/char/random devices as you do for filesystems. No big fuss over
special files, procfs, devfs, or dead chickens, your device driver just
calls up the FS api and says "hey, I'm /dev/dsp" or "hey, I'll be taking
care of /proc/cpuinfo" and it all "just works".
Also, it means that if you want to represent your multiforked filesystem as
files-as-directories, (can-o-worms: open) you can just do it. No changes to
the FS api, no other filesystems break, etc. Everything "just works".
If someone, ANYONE, could bring this kind of painfully simple FS api to
linux, and make it work, not only would I be eternally in their debt, I
would personally send them a box of genuine canadian maple-sugar candies as
a small token of my infinite thanks.
Even failing that, I urge anyone who would want to look at (re)designing
any filesystem API to look at how QNX does it. It's really a beautiful
thing. Further reading can be found in "Getting Started with QNX Neutrino
2: A Guide for Realtime Programmers", ISBN 0968250114.
I should apologise here for this email being particularily fluffy. It's
getting a bit late here, and I don't want to switch my brain on again
before I go to sleep.
For those of you who would rather not have read through this entire email,
here's the condensed version: VFS is inherintly a wrong-level API, QNX does
it much better. Flame on. :)
Cheers - Tony 'Nicoya' Mantler :)
-- Tony "Nicoya" Mantler - Renaissance Nerd Extraordinaire - nicoya@apia.dhs.org Winnipeg, Manitoba, Canada -- http://nicoya.feline.pp.se/- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:25 EST