Re: [PATCH v3 0/2] block,scsi: fixup blk_get_request dead queue scenarios

From: Joe Lawrence
Date: Wed Aug 27 2014 - 11:34:08 EST


On Wed, 27 Aug 2014 08:07:29 -0600
Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 08/26/2014 04:01 PM, Jeff Moyer wrote:
> >> Additionally, there's still quite a few places that call
> >> blk_get_request() and don't check the error return if __GFP_WAIT is set.
> >> Since most of the point of this is to fix segfaulting on queue dead
> >> scenarios, why aren't they all converted?
> >
> > Odd, I thought they all were converted last I checked. They definitely
> > should be.
>
> drivers/ide/ide-park:issue_park_cmd() (patch oddly converts just the one?!)
> drivers/ide/ide-pm.c:generic_ide_suspend()
> drivers/ide/ide-pm.c:generic_ide_resume()
> drivers/ide/ide-cd.c:ide_cd_queue_pc()
> drivers/ide/ide-atapi.c:ide_queue_pc_tail()
> drivers/ide/ide-ioctls.c:ide_cmd_ioctl()
> drivers/ide/ide-ioctls.c:generic_drive_reset()
> drivers/ide/ide-taskfile.c:ide_raw_taskfile()
> drivers/ide/ide-tape.c:idetape_queue_rw_tail()
> drivers/ide/ide-cd_ioctls.c:ide_cdrom_reset()
> drivers/ide/ide-disk.c:set_multcount()
> drivers/ide/ide-devsets.c:ide_devset_execute()
>
> Why only one location in ide-park.c was converted and the rest of IDE
> left untouched, I don't know. But there are definitely lots of them left
> in there.

These files didn't seem to have much recent development going on, so my
thinking was that if one were to bother checking the return from
blk_get_request, I would update it. If the code didn't include such
check in the first place, then I let it be.

> There's also a bug in osd_initiator.c, _init_blk_request(). We jump to
> 'out' for IS_ERR(req), which attempts to print or->request, which hasn't
> been assigned yet. This is my primary concern with this patch, basically
> every single of these call sites must be verified or it will do more
> harm than good. Have they been?

So the _init_blk_request bug has been there since c29b70f6 when the
_make_request wrapper was introduced -- I missed that when inspecting
the surrounding areas that the patch modified.

Given the scope of the changes, I agree that the probability of
introducing another bug is real. I think either you or James suggested
splitting this fix into two parts: the first patch avoiding the crash I
originally saw, the second modifying ABI to propagate out additional
information required to discern why blk_get_request failed.

As I mentioned in [1], the call chain into blk_get_request is pretty
wide. I did my best trying to hunt down the callers callers callers,
etc. to figure out how the returns are handled. Without testing every
single site, I can't be 100% sure. In the end, I'd be happy with patch
1 to avoid the original crash report. Patch 2 was the for
truth-and-beauty approach. Whether you think it's worth the risk is a
judgment call on your part.

[1] http://www.spinics.net/lists/kernel/msg1776335.html

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/