[07/67] cfq-iosched: fix cfq_cic_link() race confition

From: Greg KH
Date: Tue Jan 03 2012 - 18:08:47 EST


3.0-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>

commit 5eb46851de3904cd1be9192fdacb8d34deadc1fc upstream.

cfq_cic_link() has race condition. When some processes which shared ioc
issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
sometimes. The race condition might stop I/O by following steps:

step 1: Process A: Issue an I/O to /dev/sda
step 2: Process A: Get an ioc (iocA here) in get_io_context() which does not
linked with a cic for the device
step 3: Process A: Get a new cic for the device (cicA here) in
cfq_alloc_io_context()

step 4: Process B: Issue an I/O to /dev/sda
step 5: Process B: Get iocA in get_io_context() since process A and B share the
same ioc
step 6: Process B: Get a new cic for the device (cicB here) in
cfq_alloc_io_context() since iocA has not been linked with a
cic for the device yet

step 7: Process A: Link cicA to iocA in cfq_cic_link()
step 8: Process A: Dispatch I/O to driver and finish it

step 9: Process B: Try to link cicB to iocA in cfq_cic_link()
But it fails with showing "cfq: cic link failed!" kernel
message, since iocA has already linked with cicA at step 7.
step 10: Process B: Wait for finishig I/O in get_request_wait()
The function does not wake up, when there is no I/O to the
device.

When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx>

---
block/cfq-iosched.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3169,7 +3169,7 @@ static int cfq_cic_link(struct cfq_data
}
}

- if (ret)
+ if (ret && ret != -EEXIST)
printk(KERN_ERR "cfq: cic link failed!\n");

return ret;
@@ -3185,6 +3185,7 @@ cfq_get_io_context(struct cfq_data *cfqd
{
struct io_context *ioc = NULL;
struct cfq_io_context *cic;
+ int ret;

might_sleep_if(gfp_mask & __GFP_WAIT);

@@ -3192,6 +3193,7 @@ cfq_get_io_context(struct cfq_data *cfqd
if (!ioc)
return NULL;

+retry:
cic = cfq_cic_lookup(cfqd, ioc);
if (cic)
goto out;
@@ -3200,7 +3202,12 @@ cfq_get_io_context(struct cfq_data *cfqd
if (cic == NULL)
goto err;

- if (cfq_cic_link(cfqd, ioc, cic, gfp_mask))
+ ret = cfq_cic_link(cfqd, ioc, cic, gfp_mask);
+ if (ret == -EEXIST) {
+ /* someone has linked cic to ioc already */
+ cfq_cic_free(cic);
+ goto retry;
+ } else if (ret)
goto err_free;

out:


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/