Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressedby kernelnewbies.org regarding reiser4 inclusion]
From: David Masover
Date: Tue Aug 01 2006 - 00:55:06 EST
David Lang wrote:
On Mon, 31 Jul 2006, David Masover wrote:
And perhaps a
really good clustering filesystem for markets that
require NO downtime.
Thing is, a cluster is about the only FS I can imagine that could
reasonably require (and MAYBE provide) absolutely no downtime.
Everything else, the more you say it requires no downtime, the more I
say it requires redundancy.
Am I missing any more obvious examples where you can't have enough
redundancy, but you can't have downtime either?
just becouse you have redundancy doesn't mean that your data is idle
enough for you to run a repacker with your spare cycles.
Then you don't have redundancy, at least not for reliability. In that
case, you have redundancy for speed.
to run a
repacker you need a time when the chunk of the filesystem that you are
repacking is not being accessed or written to.
Reasonably, yes. But it will be an online repacker, so it will be
somewhat tolerant of this.
it doesn't matter if that
data lives on one disk or 9 disks all mirroring the same data, you can't
just break off 1 of the copies and repack that becouse by the time you
finish it won't match the live drives anymore.
Aha. That really depends how you're doing the mirroring.
If you're doing it at the block level, then no, it won't work. But if
you're doing it at the filesystem level (a cluster-based FS, or
something that layers on top of an FS), or (most likely) the
database/application level, then when you come back up, the new data is
just pulled in from the logs as if it had been written to the FS.
The only example I can think of that I've actually used and seen working
is MySQL tables, but that already covers a huge number of websites.
database servers have a repacker (vaccum), and they are under tremendous
preasure from their users to avoid having to use it becouse of the
performance hit that it generates. (the theory in the past is exactly
what was presented in this thread, make things run faster most of the
time and accept the performance hit when you repack). the trend seems to
be for a repacker thread that runs continuously, causing a small impact
all the time (that can be calculated into the capacity planning) instead
of a large impact once in a while.
Hmm, if that could be done right, it wouldn't be so bad -- if you get
twice the performance but have to repack for 2 hrs at the end of the
week, repacker is better, right? So if you could spread the 2 hours out
over the week, in theory, you'd still be pretty close to twice the
performance.
But that is fairly difficult to do, and may be more difficult to do well
than to implement, say, a Reiser4 plugin that operates about on the
level of rsync, but on every file modification.
the other thing they are seeing as new people start useing them is that
the newbys don't realize they need to do somthing as archaic as running
a repacker periodicly, as a result they let things devolve down to where
performance is really bad without understanding why.
Yikes. But then, that may be a failure of distro maintainers for not
throwing it in cron for them.
I had a similar problem with MySQL. I turned on binary logging so I
could do database replication, but I didn't realize I had to actually
delete the logs. I now have a daily cron job that wipes out everything
except the last day's logs. It could probably be modified pretty easily
to run hourly, if I needed to.
Moral of the story? Maybe there's something to this "continuous
repacker" idea, but don't ruin a good thing for the rest of us because
of newbies.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/