Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30

From: Daniel Phillips
Date: Sun Jul 11 2004 - 23:24:56 EST


On Monday 12 July 2004 00:08, Steven Dake wrote:
> On Sun, 2004-07-11 at 12:44, Daniel Phillips wrote:
> Oom conditions are another fact of life for poorly sized systems. If
> a cluster is within an OOM condition, it should be removed from the
> cluster (because it is in overload, under which unknown and generally
> bad behaviors occur).

You missed the point. The memory deadlock I pointed out occurs in
_normal operation_. You have to find a way around it, or kernel
cluster services win, plain and simple.

> The openais project does just this: If everything goes to hell in a
> handbasket on the node running the cluster executive, it will be
> rejected from the membership. This rejection is implemented with a
> distributed state machine that ensures, even in low memory
> conditions, every node (including the failed node) reaches the same
> conclusions about the current membership and works today in the
> current code. If at a later time the processor can reenter the
> membership because it has freed up some memory, it will do so
> correctly.

Think about it. Do you want nodes spontaneously falling over from time
to time, even though nothing is wrong with them? What does that do
your 5 nines?

> > > I would rather avoid non-mainline kernel dependencies at this
> > > time as it makes adoption difficult until kernel patches are
> > > merged into upstream code. Who wants to patch their kernel to
> > > try out some APIs?
> >
> > Everybody working on clusters. It's a fact of life that you have
> > to apply patches to run cluster filesystems right now. Production
> > will be a different story, but (except for the stable GFS code on
> > 2.4) nobody is close to that.
>
> Perhaps people skilled in running pre-alpha software would consider
> patching a kernel to "give it a run". I have no doubts about that.
>
> I would posit a guess people interested in implementing production
> clusters are not too interested about applying kernel patches (and
> causing their kernel to become unsupported) to achieve clustering
> support any time soon.

We are _far_ from production, at least on 2.6. At this point, we are
only interested in people who like to code, test, tinker, and be the
first kid on the block with a shiny new storage cluster in their rec
room. And by "we" I mean "you, me, and everybody else who hopes that
Linux will kick butt in clusters, in the 2.8 time frame."

> > > I am doubtful these sort of kernel patches will be merged without
> > > a strong argument of why it absolutely must be implemented in the
> > > kernel vs all of the counter arguments against a kernel
> > > implementation.
> >
> > True. Do you agree that the PF_MEMALLOC argument is a strong one?
>
> out of memory overload is a sucky situation poorly handled by any
> software, kernel, userland, embedded, whatever.

In case you missed it above, please let me point out one more time that
I am not talking about OOM. I'm talking about a deadlock that may come
up even when a resource usage is well within limits, which is inherent
in the basic design of Linux. There is nothing Byzantine about it.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/