Re: POHMELFS high performance network filesystem. Transactions, failover, performance.
From: Jamie Lokier
Date: Wed May 14 2008 - 17:20:12 EST
Sage Weil wrote:
> > In that model, neighbour sensing is used to find the largest coherency
> > domains fitting a set of parameters (such as "replicate datum X to N
> > nodes with maximum comms latency T"). If the parameters are able to
> > be met, quorum gives you the desired robustness in the event of
> > node/network failures. During any time while the coherency parameters
> > cannot be met, the robustness reduces to the best it can do
> > temporarily, and recovers when possible later. As a bonus, you have
> > some timing guarantees if they are more important.
>
> Anything that silently relaxes consistency like that scares me. Does
> anybody really do that in practice?
I'm doing it on a 2000 node system across a country. There are so
many links down at any given time, we have to handle long stretches of
inconsistency, and have strategies for merging local changes when
possible to reduce manual overhead. But we like opportunistic
consistency so that people at site A can phone people at site B and
view/change the same things in real time if a path between them is up
and fast enough (great for support and demos), otherwise their actions
are queued or refused depending on policy.
It makes sense to configure which data and/or operations require
global consistency or block, and which data it's ok to modify locally
and merge automatically in a netsplit scenario. Think DVCS during
splits and coherent when possible.
E.g. as a filesystem, during netsplits you might configure the system
to allow changes to /home/* locally if global coherency is down. If
all changes (or generally, transaction traces) to /home/user1 are in
just one coherent subgroup, on recovery they can be distributed
silently to the others, unaffected by changes to /home/user2
elsewhere. But if multiple separated coherent subgroups all change
/home/user1, recovery might be configured to flag them as conflicts,
queue them for manual inspection, and maybe have a policy for the
values used until a person gets involved.
Or instead of paths you might distinguish on user ids, or by explicit
flags in requests (you should really allow that anyway). Or by
tracing causal relationships requiring programs to follow some rules
(see "virtual synchrony"; the rule is "don't depend on hidden
communications").
That's a policy choice, but in some systems, typically those with many
nodes and fluctuating communications, it's really worth it. It
increases some kinds of robustness, at cost of others.
-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/