Re: "Elevator ordering" and other oddities

Thomas Schoebel-Theuer (schoebel@minnie.informatik.uni-stuttgart.de)
22 Aug 1996 13:17:52 GMT


> > Note that *if* you interrupt a "consecutive" chain, it nearly doesn't matter
> > which other request you choose:
>
> Wrong --- modern fast disks may have an average disk seek time of
> around 8ms, but the adjacent track to track time can be as low as
> 1.5ms and the edge-to-edge time as much as 14ms on these disks. That
> makes a *lot* of difference.

You are right that adjacent sectors crossing a track boundary can be
processed with very low latency time -- but that's just the case of
"consecutive" sectors I was talking about. But let there be a gap of, say,
50 sectors. Then you have to wait for the right sector, whether or not
it is on the same or on another track. The problem is that rotational delay
is in *average* (that means with gaps of random length) about 4 ms, or
from 0 to 8 ms, while the seek times go from 1 ms to 14. So a "short seek"
will take in average about 5 ms (corresponding to perhaps 60 sectors), while
long seeks take 18 ms (corresponding to about 200 sectors).

> > I find it essential to stress the non-continuity of the delay time
> > function depending on the "gap size": it has a big step at the beginning
> > and then increases relatively slow.
>
> It increases slowly, yes, but it keeps on increasing for a long time
> --- modern disks have got a lot of cylinders, and there is a big
> difference between near and far seeks.

Yes, of course. My point was that leaving a consecutive chain may take
a factor of 20 to 200, while the difference between short and long seeks
is "only" a factor of 2 or 3.

> > The problem is to find a criterion _when_ to leave the consecutive
> > chain and to prefer an elder request because of its time
> > requirements/priority.
>
> Not just that: the problem is whether to bother doing this at all. It
> depends on whether or not optimising throughput is the main aim, and
> there is a very good argument that it should be. The trouble is that
> if we reprioritise things so that large writes don't starve things
> like demand loading so much, then there are two implications for the
> demand loading processes: first, they get higher throughput while the
> contention between the two workloads persists; but secondly, that
> contention will persist for longer, since you are slowing up the other
> process. It is not at all clear that you necessarily improve
> performance at all by doing this; it's a question of balance.

Well, thats a problem any heuristics have: They work "well" in certain
situations, but worse in others. The "chunk idea" is just to use the
available information (the set of all requests, possibly with additional
read-ahead requests, enriched with priority information to ensure some
minimum fairness) and to process them as quickly as possible
(resp. with as high throughput as possible). It cannot predict the future,
i.e. which other requests will come in. But it says "hey, I have
to process sector 27000 at once because of high priority and sector 26999
later because of very low priority, so doing both in ascending order
is nearly no overhead; doing 27000 first and 26999 later (possibly after
head movement) would be a pity."

> > Another interesting strategy would be to generally bundle all requests into
> > _chunks_ of "consecutive" sectors, and to simply prefer the longest chunk
> > (if possible, without compromizing fairness too much).
>
> No, we really do need to optimise disk head movement. Also, fairness
> would indeed be a big problem.

It depends on the access patterns of the processes. If they do mostly
asynchronous I/O, there are good chances that shorter chunks get increased
while another larger one is processed. Maybe just choosing the longest
chunk is a too much simple strategy -- but I think its worth to reason
about the impact of "chunk length" on overall throughput.

Maybe "chunk length" could serve as an additional criterion when two
chunks have "nearly" the same distance from the current head position,
just for an example.

> > My argument is that if, say, the *last* request of a chunk is the eldest
> > one that should be preferred because of its age, it would be a pity
> > to not serve all the others in the same chunk just because their
> > age doesn't force it. Although elevators would process the whole
> > chunk also very likely (but in reverse order), the real difference shows up
> > if the eldest request is in the middle: then doing the whole chunk in
> > ascending order regardless of the position of the forcing request is much
> > better than elevators would do on single request level.
>
> I don't believe that that is true. Yes, the elevator can let the
> oldest requests build up for a while, but it will always get there
> reasonably quickly, and I don't think it is a viable alternative to do
> the scheduling based on age priority and not on optimisations of the
> head movement.

I don't understand you there. You can do any known strategy on chunk level
as well as conventionally on sector level. I hope we agree that optimization
of head movement is a very high goal, that has to be violated relatively
seldom because of fairness. Because of that violation, you have to use
additional "priorities" in some way or another, whether stored explicitly
or computed implicitly on the fly. So there are (rare) cases where an
elevator will prefer a request out of the middle of some consecutive chain,
or at least will process a consecutive chain in descending order.

My point is that adding the concept of "chunks" to a conventional
algorithm will have a (small, but worthwile) side effect on what would
have been done by the conventional algorithm on sector level. Using
chunks additionally will nearly do the same things as your algorithm
you dream of, but in some cases it will "take with it on the fly" some
sectors that cause *nearly no* overhead to process at the same occasion.
I say "nearly no" because the speedup factor for "consecutive" sectors
is about 20 or 50 if even compared with "short seeks".

> > If you have "chunks" implemented anyway, it is relatively easy to readahead
> > beyond their actual length. The wasted space is limited if the
> > readahead is limited to a certain percentage of the chunk length.
>
> What you need to do is anticipate _what_ data is going to be needed.
> It is not guaranteed to be consecutive on disk (although we try to
> ensure that wherever possible), and when we get to the end of a file
> we _know_ we won't want any readahead. Readahead has got to be done
> at a higher level than the request handler, so it's not really
> something we want to do via chunks.

If you want to save the effort of collecting the information you need
for a high-level model, the chunks can be used as heuristics for readahead
on low level. I think this is very easy to implement as compared to
a high-level model.

Anyway, if you have chunks already implemented and use a high-level model,
you have additional hints where (and how far) readaheads can be done nearly
without additional costs (referring to disk I/O) and where such readahead may
be counter-productive because of its seek overhead or memory consumption.
For example, your high-level information tells to do readahead at the end
of some request chain. Then you could determine the length of the readahead
at the driver level, possibly with additional memory usage information
from the buffer cache management.

Also, there are sometimes cases where you don't have exact high-level
information, for example in paging. You should always know that some request
is a paging request from the higher level -- but that's all. Determining
the length of readahead should be done at driver level, for example on
a 256 MB machine with much larger readahead chunks than on a 4 MB machine.

-- Thomas