Weirdness in block device queues.

From: Eric Youngdale (eric@andante.org)
Date: Thu Sep 07 2000 - 11:16:16 EST


    Doug Gilbert and I ran across some weirdness in the way the block device queues are plugged/unplugged. It turned up with some benchmarks of the SCSI generics driver - with the new queueing code, the generics driver is inserting requests into the same queue that block device requests are inserted.

    The oddness is this. We were observing stalls in the processing of commands that was traced to the fact that the queue had remained plugged for an excessive amount of time. The stalls last for about 5 seconds or so.

    Some investigation revealed that part of the answer is that the bdflush daemon essentially forces a bunch of dirty pages to be written to disk, but never bothers to unplug the queue when it is done. The result is that the queue remains plugged until someone else comes along and unplugs it. As it turns out, kupdate() does unplug the queue, and kupdate runs every 5 seconds or so.

    Patching bdflush to run tq_disk after flushing buffers (i.e. before the schedule()) fixed *most* of the problem, but evidently not all of it (Doug was still observing stalls, but a lot less frequently). In other words, there is someone else out there queueing requests in such a way that the queue can remain plugged for some amount of time.

    My gut tells me that it is wrong for bdflush to not unplug the queue when it is done queueing I/O requests. My gut also tells me that the generics driver probably wants to be unplugging the one specific queue that it is using to ensure that I/O gets queued right away (it doesn't make sense to unplug all queues in this instance).

    Comments?

-Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:30 EST