Re: [Linux-cluster] Re: GFS, what's remaining

From: Daniel Phillips
Date: Sun Sep 04 2005 - 14:49:36 EST


On Sunday 04 September 2005 03:28, Andrew Morton wrote:
> If there is already a richer interface into all this code (such as a
> syscall one) and it's feasible to migrate the open() tricksies to that API
> in the future if it all comes unstuck then OK. That's why I asked (thus
> far unsuccessfully):
>
> Are you saying that the posix-file lookalike interface provides
> access to part of the functionality, but there are other APIs which are
> used to access the rest of the functionality? If so, what is that
> interface, and why cannot that interface offer access to 100% of the
> functionality, thus making the posix-file tricks unnecessary?

There is no such interface at the moment, nor is one needed in the immediate
future. Let's look at the arguments for exporting a dlm to userspace:

1) Since we already have a dlm in kernel, why not just export that and save
100K of userspace library? Answer: because we don't want userspace-only
dlm features bulking up the kernel. Answer #2: the extra syscalls and
interface baggage serve no useful purpose.

2) But we need to take locks in the same lockspaces as the kernel dlm(s)!
Answer: only support tools need to do that. A cut-down locking api is
entirely appropriate for this.

3) But the kernel dlm is the only one we have! Answer: easily fixed, a
simple matter of coding. But please bear in mind that dlm-style
synchronization is probably a bad idea for most cluster applications,
particularly ones that already do their synchronization via sockets.

In other words, exporting the full dlm api is a red herring. It has nothing
to do with getting cluster filesystems up and running. It is really just
marketing: it sounds like a great thing for userspace to get a dlm "for
free", but it isn't free, it contributes to kernel bloat and it isn't even
the most efficient way to do it.

If after considering that, we _still_ want to export a dlm api from kernel,
then can we please take the necessary time and get it right? The full api
requires not only syscall-style elements, but asynchronous events as well,
similar to aio. I do not think anybody has a good answer to this today, nor
do we even need it to begin porting applications to cluster filesystems.

Oracle guys: what is the distributed locking API for RAC? Is the RAC team
waiting with bated breath to adopt your kernel-based dlm? If not, why not?

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/