[PATCH] xen/blkback: Check for insane amounts of request on the ring.

From: Konrad Rzeszutek Wilk
Date: Wed Jun 05 2013 - 08:47:51 EST


Check that the ring does not have an insane amount of requests
(more than there could fit on the ring).

If we detect this case we will stop processing the requests
and wait until the XenBus disconnects the ring.

Looking at the code, one would expect that the existing check
RING_REQUEST_CONS_OVERFLOW which checks for how many responses we
have created in the past (rsp_prod_pvt) vs requests consumed (req_cons)
and that difference between is greater or equal to the size of the
ring. If that condition is satisfied that means we should not be
processing more requests as we still have a backlog of responses
to finish. Note that both of those values (rsp_prod_pvt and req_cons)
are not exposed on the shared ring.

To understand this problem a mini crash course in ring protocol
response/request updates.

There are four entries: req_prod and rsp_prod; req_event and rsp_event.
We are only concerned about the first two - which set the tone of
this bug.

The req_prod is a value incremented by frontend for each request put
on the ring. Conversely the rsp_prod is a value incremented by the backend
for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when
pushing the responses on the ring). Both values can
wrap and are modulo the size of the ring (in block case that is 32).
Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details.

The culprit here is that if the difference between the
req_prod and req_cons is greater than the ring size we have a problem.
Fortunately for us, the '__do_block_io_op' loop:

rc = blk_rings->common.req_cons;
rp = blk_rings->common.sring->req_prod;

while (rc != rp) {

..
blk_rings->common.req_cons = ++rc; /* before make_response() */

}

will loop up to the point when rc == rp. The macros inside of the
loop (RING_GET_REQUEST) is smart and is indexing based on the modulo
of the ring size. If the frontend has provided a bogus req_prod value
we will loop until the 'rc == rp' - which means we could be processing
already processed requests (or responses) often.

The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is
b/c it only tracks how many responses we have internally produced
and whether we would should process more. The astute reader will
notice that the macro RING_REQUEST_CONS_OVERFLOW provides two
arguments - more on this later.

For example, if we were to enter this function with these values:

blk_rings->common.sring->req_prod = X+31415 (X is the value from
the last time __do_block_io_op was called).
blk_rings->common.req_cons = X
blk_rings->common.rsp_prod_pvt = X

The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons)
is doing:

req_cons - sring->rsp_prod_pvt >= 32

Which is,
X - X >= 32 or 0 >= 32

And that is false, so we continue on looping (this bug).

If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp
instead (sring->req_prod) of rc, the this macro can do the check:

req_prod - sring->rsp_prov_pvt >= 32

Which is,
X + 31415 - X >= 32 , or 31415 >= 32

which is true, so we can error out and break out of the function.

Cc: stable@xxxxxxxxxxxxxxx
[v1: Move the check outside the loop]
[v2: Add a pr_warn as suggested by David]
[v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
drivers/block/xen-blkback/blkback.c | 14 ++++++++++++--
drivers/block/xen-blkback/common.h | 2 ++
drivers/block/xen-blkback/xenbus.c | 2 ++
3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index d81dfca..c4b23be 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -397,7 +397,7 @@ int xen_blkif_schedule(void *arg)
{
struct xen_blkif *blkif = arg;
struct xen_vbd *vbd = &blkif->vbd;
-
+ int rc = 0;
xen_blkif_get(blkif);

while (!kthread_should_stop()) {
@@ -417,8 +417,12 @@ int xen_blkif_schedule(void *arg)
blkif->waiting_reqs = 0;
smp_mb(); /* clear flag *before* checking for work */

- if (do_block_io_op(blkif))
+ rc = do_block_io_op(blkif);
+ if (rc > 0)
blkif->waiting_reqs = 1;
+ if (rc == -EACCES)
+ wait_event_interruptible(blkif->shutdown_wq,
+ kthread_should_stop());

if (log_stats && time_after(jiffies, blkif->st_print))
print_stats(blkif);
@@ -778,6 +782,12 @@ __do_block_io_op(struct xen_blkif *blkif)
rp = blk_rings->common.sring->req_prod;
rmb(); /* Ensure we see queued requests up to 'rp'. */

+ /* N.B. 'rp', not 'rc'. */
+ if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rp)) {
+ pr_warn(DRV_PFX "Frontend provided bogus ring requests (%d - %d = %d). Halting ring processing on dev=%04x\n",
+ rp, rc, rp - rc, blkif->vbd.pdevice);
+ return -EACCES;
+ }
while (rc != rp) {

if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc))
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 60103e2..43ef7f2 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -231,6 +231,8 @@ struct xen_blkif {
unsigned long long st_wr_sect;

wait_queue_head_t waiting_to_free;
+ /* Thread shutdown wait queue. */
+ wait_queue_head_t shutdown_wq;
};


diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 8bfd1bc..00f2573 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -118,6 +118,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
blkif->st_print = jiffies;
init_waitqueue_head(&blkif->waiting_to_free);
blkif->persistent_gnts.rb_node = NULL;
+ init_waitqueue_head(&blkif->shutdown_wq);

return blkif;
}
@@ -177,6 +178,7 @@ static int xen_blkif_map(struct xen_blkif *blkif, unsigned long shared_page,
static void xen_blkif_disconnect(struct xen_blkif *blkif)
{
if (blkif->xenblkd) {
+ wake_up(&blkif->shutdown_wq);
kthread_stop(blkif->xenblkd);
blkif->xenblkd = NULL;
}
--
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/