Re: [PATCH 1b/7] dlm: core locking

From: Daniel Phillips
Date: Thu Apr 28 2005 - 18:43:31 EST


On Thursday 28 April 2005 08:37, Lars Marowsky-Bree wrote:
> On 2005-04-27T22:52:55, Daniel Phillips <phillips@xxxxxxxxx> wrote:
> > > So we can't deliver it raw membership events. Noted.
> >
> > Just to pick a nit: there is no way to be sure a membership event might
> > not still be on the way to the dead node, however the rest of the cluster
> > knows the node is dead and can ignore it, in theory. (In practice, only
> > (g)dlm and gfs are well glued into the cman membership protocol, and
> > other components, e.g., cluster block devices and applications, need to
> > be looked at with squinty eyes.)
>
> I'm sorry, I don't get what you are saying here. Could you please
> clarify?
>
> "Membership even on the way to the dead node"? ie, you mean that the
> (now dead) node hasn't acknowledged a previous membership which still
> included it, because it died inbetween? Well, sure, membership is never
> certain at all; it's always in transition, essentially, because we can
> only detect faults some time after the fact.

Exactly, and that is what the barriers are for. I like the concept of
barriers a whole lot. We should put this interface on a pedestal and really
dig into how to use it, or even better, how to optimize it.

But for now, as I understand it, a cluster client's view of the cluster world
is entirely via cman events, which include things like other nodes joining
and leaving service groups. (Service groups are another interface we need to
put on a pedestal, and start working on, because right now it's a clean idea,
not thought all the way through.)

> (It'd be cool if we could mandate nodes to pre-announce failures by a
> couple of seconds, alas I think that's a feature you'll only find in an
> OSDL requirement document, rated as "prio 1" ;-)

Heh, I generally think about failing over in less than a second, preferably
much, much less. Maybe you just have to scale your heuristic a little?

> I also don't understand what you're saying in the second part. How are
> gdlm/gfs "well glued" into the CMAN membership protocol, and what are we
> looking for when we turn our squinty eyes to applications...?

Gdlm and gfs are well-glued because Dave and Patrick have been working on it
for years. Other components barely know about the interfaces, let alone use
them correctly. In the end _every component_ of the cluster stack has to do
the dance correctly on every node. We've really only just started on that
path. Hopefully we'll be able to move down it much more quickly now that the
code is coming out of the cathedral.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/