Re: elevator algorithm considered irrelevant

Mitchell Blank Jr (mitch@execpc.com)
Tue, 17 Nov 1998 08:06:36 -0600


Larry McVoy wrote:
> The reason is that UFS, at least, did a fantastic job making sure that
> all the data you cared about was close together. So there was little
> or nothing that you could do for the single user work station case,
> the file system had, in effect, already sorted the requests before they
> were sent to the disk.

Unfortunately this assumes two things:
1. A good filesystem (ext2 and UFS are probably good in this regard, other
filesystems or swap may not be)
2. Only one active partition per disk. In the case of multiple partitions
each getting heavy use you definitely want to do what you can to
cluster accesses to avoid seeking heavily between them.

You do have a point that people are probably trying too hard. There's 3
types of hard drives we can consider:
1. Really old harddrives (say, <1985). This was the heyday of OS-driven
access time optimizations. As anyone who has ever used a "classic"
BSD system surely remembers, you would enter everything about the
physical properties of the drive and the OS and FS would optimize
accesses to the point of knowing what sector would be under the
heads at a particular time. This worked well because drives were
slow (so there was a lot to be gained), drives were stupid (so they
won't be doing any optimization themselves), and drives were simple
(so it was easy to determine the exact physical characteristics).

Of course, most of us aren't using Fujitsu Eagles anymore, so we
don't really need to optimize this case.

2. Moderately old or cheap drives. Being modern, these devices can't
be easily characterized to the same degree that the old ones could be
(non-constant sectors-per-track, non-linear seek times, ...) but
we can still decide that seek times will roughly approximate the
distance between block numbers in most cases. So basic techniques
like the elevator algorithm may help out.

3. Newer drives. Newer drives can cache a fair amount of access and
likely are optimizing multiple-issued requests themselves to
some extent. It still probably doesn't hurt to put some minimal
effort into sorting them beforehand, though -- the number of
requests we're going to issue at once may be small.

So basically there's no need to get this perfect anymore since we don't
have nearly as much knowledge about what's going on compared to the drive
itself. We're probably most in danger of working too hard at sorting
the list (thus using up valuable CPU time) than we are at working too
little. Any reasonably correct algorithm will probably do fine. This
shouldn't discourage people fro trying to improve it, but they should be
careful to use as little CPU time as possible, especially in the case of
long request lists.

Also, there is the case of ram-based disks, where any amount of seek
optimization is wasted since seek times are zero anyway. Something
to think about.

> [...] I'd
> strongly urge people that want to work on this to instrument the kernel
> and gather some traces from a busy web server (or whatever it is you are
> trying to make go fast) and stare at them for a long time, long enough
> to understand why you are seeing what you are seeing. The reason is
> that the file system's allocation policy, read ahead policy, meta data
> placement, etc., all skew the data.

Everyone should read this paragraph several times before trying to write
any code. It's _VERY_ easy to come up with cases where algorithm A
is {more optimal,cheaper,more fair} than algorithm B given a theoretical
set of accesses. Things tend to be very different in the real world.
Nowhere is that true more than disk access patterns. Larry makes a very
good point here.

-Mitch

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/