Re: [ext3][kernels >= at least] KDE going comatose when FSis under heavy write load (massive starvation)

From: Mikulas Patocka
Date: Sat Apr 28 2007 - 02:11:12 EST

On Fri, 27 Apr 2007, Bill Huey wrote:

On Fri, Apr 27, 2007 at 12:50:34PM -0700, Linus Torvalds wrote:
Oh, well.. Journalling sucks.

I was actually _really_ hoping that somebody would come along and tell
everybody that this whole journal-logging is stupid, and that it's just
better to not ever re-write blocks on disk, but instead write to new
blocks with version numbers (and not re-use old blocks until new versions
are stable on disk).

There was even somebody who did something like that for a PhD thesis, I
forget the details (and it apparently died when the thesis was presumably
accepted ;).

That sounds a whole lot like NetApp's WAFL file system and is heavily patented.



SpadFS doesn't write to unallocated parts like log filesystems (LFS) or phase tree filesystems (TUX2); it writes inside normal used structures, but it marks each structure with generation tags --- when it updates global table of tags, it atomically makes several structures valid. I don't know about this idea being used elsewhere.

It's fsync is slow too (needs to write all (meta)data too), but it at least doesn't livelock --- fsync is basically:
* write all buffers and wait for completion
* take lock preventing metadata updates
* write all buffers again (those that were updated while previous write was in progress) and wait for completion
* update global generation count table
* release the lock

Maybe Suse will be paying me from this autumn to make more features to it --- so far it works, doesn't eat data, but isn't much known :)

