Re: elevator messages in 2.3.50

From: Jamie Lokier (lk@tantalophile.demon.co.uk)
Date: Thu Mar 09 2000 - 08:51:46 EST


Andrea Arcangeli wrote:
> >A possible improvement would be a small holdoff time between one request
> >and the next, if a large seek would be involved. The idea here is to
> >support applications which do a sequence of reads in a local region,
> >where each read is not queued until the previous one completes. The
> >hottest example is page-in but there are others.
>
> This is usually handled by readahead (we do readahead also during paging).

Readahead only handles the case where the next read is... ahead!

Even generalised, it can only handle the case where you can predict the
next block.

No, I suggested holdoff to handle those cases where you _cannot_ predict
the next block to be read.

This came up while considering how to ensure an interactive process can
be run efficiently despite background processes hogging the disk.

But since I thought some more, I realised it is a general problem: there
are cases where you don't know what block would be read next, where a
small holdoff time (less than 100 microseconds) would improve the
overall I/O rate.

For paging, readaround still does not predict the early sequence of
accesses when a program starts, and it does not help when there is
enough memory pressure to reduce the resident set to the working set.

Both these things cause interactive programs to run slowly when a system
is overloaded, and might even slow down the overall execution time of
all the background process.

> >The holdoff time would be just enough to permit an application to queue
> >the next request, if it is going to do that immediately. E.g. I have
> >process A doing lots of I/O. Details not important.
> >
> >I also have process B. It reads something, sleeps, wakes up and quickly
> >issues a read for the next thing. That might be via paging or explicit
> >reads.
> >
> >To minimise overall seek time, it is probably better to _not_ schedule
> >an I/O from process A in the short time between process B's two
> >requests, _if_ A's request would imply a large seek. However, the only
>
> Definitely. And A's request almost always imply a seek. By using
> readahead, B should make the two requests at the same time.

This is true. I didn't mention it because I thought it was obvious :-)
The holdoff is to deal with cases when readahead does _not_ predict the
next read.

> >way to do that is to let the device idle for a short time. Sure it's
> >idle, but overall seek time is reduced.
>
> Hans told me about something like that last month. That can make sense for
> the indirect blocks where we can't do readahead on all the lower level
> indirect blocks but we know that if the fs did a good job the indirect
> blocks will be near each other.
>
> But there's a first problem that it looks too much dependent on the
> timings (we need to know about the timings of a sync-read).

Yes, some kind of self balancing algorithm would be best. But note,
that the minimum useful holdoff time is also a function of the CPU
speed: it should be just long enough for an I/O completion to wake up a
task, that task to run and issue another request. That time is
independent of the device sync-read time.

> Then it's possible to do that only with fs support that gives an hint
> to the elevator, otherwise you have _no_way_ to know there will be a
> request any time soon (and waiting without knowing there will be a
> near read veyr soon is not an option). Then with a lvm/raid array
> under you may do a very wrong thing delaying the very far seek because
> there could be no hardware seek but only a disk change. As last thing
> rescheduling may cause delay due high cpu load etc...

I think you misunderstand. A hint is something completely different. I
propose a small holdoff of the order of a hundred microseconds (I didn't
say it was easy to implement) when the next I/O is likely to cause a
large seek and the run queue is non-empty.

Ideally the exact time and/or seek distance threshold which triggers it
would be self-tuning.

It is to handle cases where you simply _don't know_ where the next read
will be (so you can't do readahead), and you don't know _if_ there will
be a next read (that's why it's a small holdoff). That case is quite
common during paging; less so for normal file operations except perhaps
sequences of stat() calls.

I assert that a short delay, if you can implement it, will improve
overall I/O throughput as well as paging performance.

> So basically I believe it's a kernel-bloat-hack not worty to implement.
>
> And implementing extents in the filesystem is going to take care of the
> indirect blocks issue anyway.

But not the issue I'm talking about.

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:15 EST