Re: Scheduled Transfer Protocol on Linux

From: Zachary Amsden (zamsden@cthulhu.engr.sgi.com)
Date: Sun Feb 13 2000 - 00:29:28 EST


> : > OK, so tell me: how many locks does it take to scale up the following to
> : > 16 CPUs:
> : > local disks
> : > local file system
> : > remote file system
> : > processes
> : > networking interfaces and stack
> : >
> : > What do the locks cover? At 16 CPUs, can you keep all the locks straight
> : > in your head? Nope. So what happens when you go into the kernel and add
> : > a feature? You add a lock. What does that do? Increases the number of
> : > locks. What effect does that have? Makes it more likely that you'll add
> : > more locks, because now it is even less obvious what the lock protects.

I still have to argue very much against the feature adding comment. Not all
features require locking, most of the time they can use the basic locking
model already in place.

A lock protects a data structure. If the locking model is specified nicely,
it is also clear what linked data structures this affects, and how any of that
is modified by reference counting. Some cases can be a bit obtuse (i.e.
streams), but that is just a can of worms I don't want to open.

> :
> : Good design can avoid these problems. If it isn't obvious what a lock
> : protects, you should rethink your locking structure.
>
> I notice that you didn't actually answer any of the questions above, just
> a nice hand wave that says "it need not be so". Well, just to indulge
> me, could you please either:
>
> a) answer the questions above, or
> b) show a shipping, production system which demonstrates your claims, or
> c) admit that you don't know the answer.

So to answer A, I take it I should present a summary of all kernel structures
and the locking model involved with each of them? I conceed your point that
one can't keep the whole thing in one's head at once, but it isn't necessary
to do that. The locking model can be clearly formulated and intuitive on each
domain of objects, and that is all I was trying to say. If it can't be done
in a way that scales, then you should rethink that domain of objects, because
there is probably a better approach.

I would answer b, but I don't want to get into that war.

> : I wasn't arguing that 16 way SMP is OK. Everyone knows it isn't.
>
> Geez, and this from the guy who said Linux needs to support "16-64 way SMP".

SMP sucks for greater than 2-4 cpus. It is way too easy to saturate the bus.
I am talking about different architectures than SMP.

> : Are you saying that clusters of small SMP machines are better?
>
> read these:
> http://www.bitmover.com/llnl/smp.pdf
> http://www.bitmover.com/llnl/labs.pdf

Small clusters of machines are excellent for certain applications, like web
and media serving. Transaction processing for enormous databases is another
animal entirely, and I see that falling into the 4-32 CPU category. I haven't
seen concrete results saying this would work better on a cluster or a large
CPU machine.

> : So the locking
> : moves from the kernels to the application layer. You still have the same
> : synchronization concerns, it's just a matter of what layer they are
> : implemented at.
 
> Err, if you had actually done this, you'd find that your statements
> are unsupportable in practice. Please show me an application that has
> anything, even with an order of magnitude, like the number of locks
> taken/released per second in IRIX or Solaris.

A distributed shared database can easily reach the same order of magnitude on
the number of consistancy operations it needs to perform. Not only that, but
your latency on each operation is increased in a cluster. Taking a lock can
spin - you still have that issue in a cluster when you need private access to
an object. But if it doesn't spin, the overhead of taking a lock is very
small, and releasing it is even less.

What do you have against increasing lock granularity to the point it scales up
to 64 or so threads?

-- 
Zachary Amsden  zamsden@engr.sgi.com  (650) 933-6919  09U-510  Core Protocols

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:24 EST