Re: iommu_iova leak [inside 3w-9xxx]

From: Chris Boot
Date: Mon Sep 19 2011 - 11:33:09 EST


On 18/09/2011 17:19, Chris Boot wrote:
On 18/09/2011 15:56, James Bottomley wrote:
On Sun, 2011-09-18 at 18:25 +0400, James Bottomley wrote:
On Sun, 2011-09-18 at 15:05 +0100, Chris Boot wrote:
Hardly ... all it's saying is that twa_exit doesn't wait for pending I/O
to complete, so when you remove the module it tears down in the middle
of an I/O. A bug, yes, but it's not indicative of any sort of leak in
the maps/unmaps.

James,

I don't think that's the case - I had unmounted all filesystems, deactivated all volume groups, and performed a sync before waiting a few seconds and running rmmod. Next time I'll also 'echo 1> /sys/block/sdX/device/delete' if that's helpful.
Actually, I take all that back: the driver has a bug in QUEUE_FULL
handling: twa_scsi_queue() calls twa_scsiop_execute_scsi(), which maps
the dma buffer, but if the card responds QUEUE_FULL it just returns
SCSI_MLQUEUE_HOST_BUSY without ever unmapping. That leg in the code
frees the request but also doesn't unmap it. In fact any error return
from twa_scsiop_execute_scsi() seems to have the same problem (but
QUEUE_FULL is the only silent one).

I trust Adam will fix this.
Actually, while Adam's mulling this, try the following. It should at
least confirm we're on the right track.

James

---

diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
index b7bd5b0..3868ab2 100644
--- a/drivers/scsi/3w-9xxx.c
+++ b/drivers/scsi/3w-9xxx.c
@@ -1800,10 +1800,12 @@ static int twa_scsi_queue_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_
switch (retval) {
case SCSI_MLQUEUE_HOST_BUSY:
twa_free_request_id(tw_dev, request_id);
+ twa_unmap_scsi_data(tw_dev, request_id);
break;
case 1:
tw_dev->state[request_id] = TW_S_COMPLETED;
twa_free_request_id(tw_dev, request_id);
+ twa_unmap_scsi_data(tw_dev, request_id);
SCpnt->result = (DID_ERROR<< 16);
done(SCpnt);
retval = 0;

James,

After testing quite thoroughly on my test install (lots of parallel e2fsck and an rsync from one LV to another) I couldn't trigger the warning. It's probably too early to tell for sure if this fixes things on my normal workload (I'll be able to tell definitely tomorrow), but it certainly seems much better right now. I'll keep you posted.

The patch above certainly seems to fix things for me. After almost 24 hours uptime my iommu_iova looks entirely reasonable:

iommu_iova 592 885 64 59 1 : tunables 120 60 8 : slabdata 15 15 60

We'll see in a few days' time if this resolves my entire performance problem, but I'm nearly 100% confident it does.

Thanks.

Chris

--
Chris Boot
bootc@xxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/