Re: POHMELFS high performance network filesystem. Transactions, failover,performance.
From: Jeff Garzik
Date: Wed May 14 2008 - 15:04:00 EST
Evgeniy Polyakov wrote:
Hi Sage.
On Wed, May 14, 2008 at 06:35:19AM -0700, Sage Weil (sage@xxxxxxxxxxxx) wrote:
What is your opinion of the Paxos algorithm?
It is slow. But it does solve failure cases.
For writes, Paxos is actually more or less optimal (in the non-failure
cases, at least). Reads are trickier, but there are ways to keep that
fast as well. FWIW, Ceph extends basic Paxos with a leasing mechanism to
keep reads fast, consistent, and distributed. It's only used for cluster
state, though, not file data.
Well, it depends... If we are talking about single node perfromance,
then any protocol, which requries to wait for authorization (or any
approach, which waits for acknowledge just after data was sent) is slow.
Quite true, but IMO single-node performance is largely an academic
exercise today. What production system is run without backups or
replication?
If we are talking about agregate parallel perfromance, then its basic
protocol with 2 messages is (probably) optimal, but still I'm not
convinced, that 2 messages case is a good choise, I want one :)
I think part of Paxos' attraction is that it is provably correct for the
chosen goal, which historically has not been true for hand-rolled
consensus algorithms often found these days.
There are a bunch of variants (fast paxos, byzantine paxos, fast
byzantine paxos, etc., etc.) based on Classical Paxos which make
improvements in the performance/latency areas. There is even a Paxos
Commit which appears to be more efficient than the standard transaction
two-phase commit used by several existing clustered databases.
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/