Re: Block device throttling [Re: Distributed storage.]

From: Daniel Phillips
Date: Mon Aug 13 2007 - 11:18:30 EST


On Monday 13 August 2007 05:18, Evgeniy Polyakov wrote:
> > Say you have a device mapper device with some physical device
> > sitting underneath, the classic use case for this throttle code.
> > Say 8,000 threads each submit an IO in parallel. The device mapper
> > mapping function will be called 8,000 times with associated
> > resource allocations, regardless of any throttling on the physical
> > device queue.
>
> Each thread will sleep in generic_make_request(), if limit is
> specified correctly, then allocated number of bios will be enough to
> have a progress.

The problem is, the sleep does not occur before the virtual device
mapping function is called. Let's consider two devices, a physical
device named pdev and a virtual device sitting on top of it called
vdev. vdev's throttle limit is just one element, but we will see that
in spite of this, two bios can be handled by the vdev's mapping method
before any IO completes, which violates the throttling rules. According
to your patch it works like this:

Thread 1 Thread 2

<no wait because vdev->bio_queued is zero>

vdev->q->bio_queued++

<enter devmapper map method>

blk_set_bdev(bio, pdev)
vdev->bio_queued--

<no wait because vdev->bio_queued is zero>

vdev->q->bio_queued++

<enter devmapper map method>

whoops! Our virtual device mapping
function has now allocated resources
for two in-flight bios in spite of having its
throttle limit set to 1.

Perhaps you never worried about the resources that the device mapper
mapping function allocates to handle each bio and so did not consider
this hole significant. These resources can be significant, as is the
case with ddsnap. It is essential to close that window through with
the virtual device's queue limit may be violated. Not doing so will
allow deadlock.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/