Re: [rfc][patch 2.6.18-rc7] block: explicit plugging

From: Jens Axboe
Date: Fri Oct 06 2006 - 07:58:07 EST


On Sat, Sep 16 2006, Nick Piggin wrote:
> Hi,
>
> I've been tinkering with this idea for a while, and I'd be interested
> in seeing what people think about it. The patch isn't in a great state
> of commenting or splitting ;) but I'd be interested feelings about the
> general approach, and whether I'm going to hit any bad problems (eg.
> with SCSI or IDE).
>
> Nick
>
>
> This is a patch to perform block device plugging explicitly in the submitting
> process context rather than implicitly by the block device.
>
> There are several advantages to plugging in process context over plugging
> by the block device:
>
> - Implicit plugging is only active when the queue empties, so any
> advantages are lost if there is parallel IO occuring. Not so with
> explicit plugging.
>
> - Implicit plugging relies on a timer and watermarks and a kind-of-explicit
> directive in lock_page which directs plugging. These are heuristics and
> can cost performance due to holding a block device idle longer than it
> should be. Explicit plugging avoids most of these issues by only holding
> the device idle when it is known more requests will be submitted.
>
> - This lock_page directive uses a roundabout way to attempt to minimise
> intrusiveness of plugging on the VM. In doing so, it gets needlessly
> complex: the VM really is in a good position to direct the block layer
> as to the nature of its requests, so there is no need to try to hide
> the fact.
>
> - Explicit plugging keeps a process-private queue of requests being held.
> This offers some advantages over immediately sending requests to the
> block device: firstly, merging can be attempted on requests in this list
> (currently only attempted on the head of the list) without taking any
> locks; secondly, when unplugging occurs, the requests can be delivered
> to the block device queue in a batch, thus the lock aquisitions can be
> batched up.
>
> On a parallel tiobench benchmark, of the 800 000 calls to __make_request
> performed, this patch avoids 490 000 (62%) of queue_lock aquisitions by
> early merging on the private plugged list.

Nick, this looks pretty good in general from the vm side, and the
concept is nice for reduced lock bouncing. I've merged this for more
testing in a 'plug' branch in the block repo.

> Testing and development is in early stages yet. In particular, the lack of
> a timer based unplug kick probably breaks some block device drivers in
> funny ways (though works here for me with SCSI and UML so far). Also needs
> much wider testing.

Your SCSI changes are pretty broken, I've fixed them up. We need some
way of asking the block layer to back off and rerun is sometime soon,
which is what the plugging does in that case. I've introduced a new
mechanism for that.

Changes:

- Don't invoke ->request_fn() in blk_queue_invalidate_tags

- Fixup all filesystems for block_sync_page()

- Add blk_delay_queue() to handle the old plugging-on-shortage
usage.

- Unconditionally run replug_current_nested() in ioschedule()

- Merge to current git tree.

I'll try to do some serious testing on this soon. It would also be nice
to retain the plugging information for blktrace, even if it isn't per
queue anymore. Hmmm.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/