I/O System Overhaul

Andrew E. Mileski (aem@netcom.ca)
Sun, 24 Aug 1997 13:13:33 -0400 (EDT)


> > list when it was active. He was all for the generic mid-layer
> > (which would replace what we currently know as ll_rw_blk.c) so
> > that we could
>
> He was all for removing the SCSI one tho
>
> Right. Which reminds me about Eric Youngdale's massive scsi hacks to
> improve error handling and to do end request processing in bh_handlers
> to further simplify things. I want to make sure that gets in for
> 2.2.0.

I'll add my 2 bits for loop driver requirements:
- I/O system must be re-entrant (the current one _almost_ is)
I had to fix (kludge) this in my loop driver overhaul patch.

In more detail:
Request headers (or the equivalent) need to be counted. There should
be two counts maintained - requests free for reads, and writes. All
drivers must request _and_ free requests through the same mechanism,
so that the counts can be maintained - in case of a broken driver,
a last-gasp effort can be going through all requests and counting.
The request allocator must be aware that a single request can result
in multiple requests, and either be prepared to handle _all_ the
requests, or put the requestor to sleep. This means there has to be
a way of growing/shrinking request headers and/or reserving them.
All of this also applies to buffers of course, as every request also
requires buffers.

Loop device example:
The loop driver is a re-entrant beast, especially when a loop device is
also looped (up to a depth of NR_LOOP = 8). Loop devices being re-entrant
need two requests (1 for original request, 1 for the loop generated
request). If 2 requests of the required type (read or write) are not
available, the driver must sleep.

In the worst case, when all loop devices are looped, NR_LOOP + 1
requests are required for EVERY request made on that loop device!
If there are not enough requests available, the kernel can end up
in a deadlock - waiting for requests to finish (to free up space)
that cannont finish until another request finishes first.

The solution is simple - don't start a request unless there are enough
requests available for the depth of the loop device. I had to add a
mechanism to the kernel and loop driver to look for this depth before
starting to process a request, and sleep if necessary (not enough free
requests). Works great!

All of the above for request headers also applies to buffers.

Now when bdflush triggers...more requests are needed! Boom! Unless
bdflush either skips loop device buffers, or goes back to sleep.
It should probably skip them, but that isn't a sure-fire solution
when all the buffers are loop buffers. Of course, bdflush needs to
know if writing a buffer will succeed before actually doing it.

I don't have an easy or good solution to the entire buffer deadlock
problem, except limiting the processing of non-bdflush re-entrant
requests to 1 per driver (not device), in hopes that this will not
use up all available buffers. This is terribly inefficient, but
works, though I've not found this severe measure to be necessary.

--
Andrew E. Mileski   mailto:aem@netcom.ca