Re: ext3-0.0.2e released

From: Daniel Phillips (news-list.linux.kernel@innominate.de)
Date: Sat Jul 29 2000 - 08:11:57 EST


David Gould wrote:
>
> On Tue, Jul 25, 2000 at 12:16:42PM +0200, Daniel Phillips wrote:
> > Please excuse me for the delay in responding to this...
>
> Me too...
>
> > "Stephen C. Tweedie" wrote:
> > >
> > > On Thu, Jul 06, 2000 at 09:19:58PM +0100, Steve Whitehouse wrote:
> > > > can you explain "phase tree" and/or give a reference ?
> > >
> > > For a reference, wait until ALS and see the ext2-derived filesystem
> > > report. :-)
> > 
> > Which is when I'll deliver my white paper on Tux2fs - thanks for
> > being mysterious and building up the anticpation. ;-)
> >
> > > For basic background, look up some of the WAFL white papers from
> > > NetApps. The basic idea is an old database one: you have your
> > > filesystem in a tree, and whenever you modify the tree, you write into
> > > new blocks. Then the next level up in the tree --- which contains
> > > pointers to the old blocks --- gets modified to point to the new
> > > blocks, and those changes too get written to new blocks, so you then
> > > need to update the pointers in the _next_ level up the tree.
> > >
> > > So you do your changes right up the tree, allocating all your new
> > > blocks in sequential order on disk somewhere (anywhere, unlike a LFS),
> > > and now all you need to do to make the entire new set of writes
> > > visible after a reboot is to move the root node pointer for the
> > > filesystem from the old root block to the new one. It's a beautiful
> > > mechanism for achieving transactional consistency, and it lends itself
>
> I had not heard the term "phase tree", but from your description is sounds
> exactly like what is often called "shadow paging". Which may make sense for
> file systems, but none of the commercial DB engines use it, probably because
> of the latency issues you mention.
>
> Or have I misunderstood?

I checked and no, it's not 'shadow paging', at least if I can rely this
description:

  http://lunar.cs.byu.edu/cs453/notes/html-15/tsld015.htm

Some assertions:

  "Disadvantages of shadow page over log-based:
  [...]
     Garbage collection: find all the garbage pages & add them
     to the list of ?free? pages, a committed transaction causes
     db pages containing data changed by the transaction
     inaccessible which become garbage (since they are not part
     of the free space & do not contain usable info.)
  [...]
     Difficult to be adapted for concurrent transactions"

Phase tree does not generate garbage, nor is it unsuitable for concurrent
tranactions - please see the discussion under the thread "[NFS] Re: NetApps et
al. <- NVRAM" on the linux-nfs list.

Phase tree by itself isn't that great for high-volume low-latency transactions:
you would have to have very short phases to keep latency down, and that would
force quite a few redundant metadata writes. However, the addition of a *small*
journal on disk or in NVRAM would knock the commit latency down to nearly zero
without impacting phase tree's other advantages.

-- 
Daniel

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jul 31 2000 - 21:00:30 EST