Re: [PATCH 1/2] block: fix leaks associated with discard requestpayload
From: FUJITA Tomonori
Date: Sun Jun 27 2010 - 06:35:47 EST
On Sun, 27 Jun 2010 19:01:33 +0900
FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> wrote:
> On Sun, 27 Jun 2010 11:26:52 +0200
> Christoph Hellwig <hch@xxxxxx> wrote:
> > On Sun, Jun 27, 2010 at 05:49:29PM +0900, FUJITA Tomonori wrote:
> > > On Sat, 26 Jun 2010 15:56:50 -0400
> > > Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> > >
> > > > Fix leaks introduced via "block: don't allocate a payload for discard
> > > > request" commit a1d949f5f44.
> > > >
> > > > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > > > discard request's payload directly in scsi_finish_command().
> > >
> > > Instead of adding another discard hack to scsi_finish_command(), how
> > > about converting discard to REQ_TYPE_FS request? discard is FS request
> > > from the perspective of the block layer. It also fixes a problem that
> > > discard isn't retried in the case of UNIT ATTENTION.
> > >
> > > I think that we can get more cleaner code if we handle discard as
> > > normal (fs) request in the block layer (and scsi-ml). We need more
> > > changes but this patch is the first step.
> > Making discard a REQ_TYPE_FS inside scsi (it already is before entering
> > sd_prep_fn) means we'll need to special case it all over the I/O
> > submission and completion path. Having the payload length not matching
> Hmm, my patch doesn't add any special case in scsi submission and
> completion. sd_prep_fn already has a hack for discard to set
> bi->bi_size to rq->__data_size so scsi can tell the block layer to
> finish discard requests.
> Adding another special case for discard to scsi_io_completion()
> doesn't look good.
> About the block layer, we already have special case for discard
> everywhere (rq->cmd_flags & REQ_DISCARD).
> > the transfer length is something we don't expect for FS requests.
> Yeah, that's tricky. I'm not sure yet which is better; change how the
> block layer handles the transfer length or let the lower layer to add
> pages (as we do now).
> > > index e16185b..9e15c46 100644
> > > --- a/block/blk-lib.c
> > > +++ b/block/blk-lib.c
> > > @@ -20,6 +20,10 @@ static void blkdev_discard_end_io(struct bio *bio, int err)
> > > if (bio->bi_private)
> > > complete(bio->bi_private);
> > >
> > > + /* free the page that the lower layer allocated */
> > > + if (bio_page(bio))
> > > + __free_page(bio_page(bio));
> > > +
> > This is exactly what this patchkit gets rid off. Having a payload
> > page that the caller tracks (previously fully, with this patch only for
> > freeing) makes DM's life a lot harder. Remember we don't actually store
> > any payload in there before entering sd_prep_fn - it's just that the
> > scsi commands implementing discards need some payload - either a sector
> > sizes zero filled buffer for WRITE SAME, or an LBA/len encoding inside
> > the payload for UNMAP.
> It's so bad if the block layer frees pages that the lower layer
> allocates? I thought it's ok if the block layer doesn't allocate.
> It's better if sd_done() frees a page? As my patch does, if we handle
> discard as FS in scsi-ml, sd_done() is called.
How about this?
From: FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>
Subject: [PATCH] convert discard to REQ_TYPE_FS instead of REQ_TYPE_BLOCK_PC
Fixes the two issues:
- leak of pages that scsi_setup_discard_cmnd() allocates (because we
don't call sd_done for pc requets).
- discard requests aren't retried when possible (e.g. UNIT ATTENTION).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>
drivers/scsi/sd.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d447726..056c8e1 100644
@@ -432,7 +432,6 @@ static int scsi_setup_discard_cmnd(struct scsi_device *sdp, struct request *rq)
nr_sectors >>= 3;
- rq->cmd_type = REQ_TYPE_BLOCK_PC;
rq->timeout = SD_TIMEOUT;
memset(rq->cmd, 0, rq->cmd_len);
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/