Re: [PATCH] bfq: Check if bfqq is NULL in bfq_insert_request

From: Guenter Roeck
Date: Mon Jul 22 2019 - 20:06:46 EST


On Tue, Jul 23, 2019 at 07:36:16AM +0800, Bob Liu wrote:
> On 7/23/19 4:29 AM, Guenter Roeck wrote:
> > On Mon, Jul 22, 2019 at 10:30:48AM -0700, Guenter Roeck wrote:
> >> In bfq_insert_request(), bfqq is initialized with:
> >> bfqq = bfq_init_rq(rq);
> >> In bfq_init_rq(), we find:
> >> if (unlikely(!rq->elv.icq))
> >> return NULL;
> >> Indeed, rq->elv.icq can be NULL if the memory allocation in
> >> create_task_io_context() failed.
> >>
> >
> > The above function should have been ioc_create_icq(), sorry.
> >
> > Guenter
> >
> >> A comment in bfq_insert_request() suggests that bfqq is supposed to be
> >> non-NULL if 'at_head || blk_rq_is_passthrough(rq)' is false. Yet, as
> >> debugging and practical experience shows, this is not the case in the
> >> above situation.
> >>
> >> This results in the following crash.
> >>
> >> Unable to handle kernel NULL pointer dereference
> >> at virtual address 00000000000001b0
> >> ...
> >> Call trace:
> >> bfq_setup_cooperator+0x44/0x134
> >> bfq_insert_requests+0x10c/0x630
> >> blk_mq_sched_insert_requests+0x60/0xb4
> >> blk_mq_flush_plug_list+0x290/0x2d4
> >> blk_flush_plug_list+0xe0/0x230
> >> blk_finish_plug+0x30/0x40
> >> generic_writepages+0x60/0x94
> >> blkdev_writepages+0x24/0x30
> >> do_writepages+0x74/0xac
> >> __filemap_fdatawrite_range+0x94/0xc8
> >> file_write_and_wait_range+0x44/0xa0
> >> blkdev_fsync+0x38/0x68
> >> vfs_fsync_range+0x68/0x80
> >> do_fsync+0x44/0x80
> >>
> >> The problem is relatively easy to reproduce by running an image with
> >> failslab enabled, such as with:
> >>
> >> cd /sys/kernel/debug/failslab
> >> echo 10 > probability
> >> echo 300 > times
> >>
> >> Avoid the problem by checking if bfqq is NULL before using it. With the
> >> NULL check in place, requests with missing io context are queued
> >> immediately, and the crash is no longer seen.
> >>
>
>
> What about other place use blk_init_rq()?
> E.g in bfq_request_merged():
> 1897 struct bfq_queue *bfqq = bfq_init_rq(req);
> 1898 struct bfq_data *bfqd = bfqq->bfqd;
>
> Which may have same Null-pointer bug?
>
Good question. Doug asked it as well. My understanding is that the code
won't hit those places since bfq is essentially bypassed. Of course,
I may be completely wrong, so we should wait for Paolo to confirm
if I my understanding is correct (or wrong).

Thanks,
Guenter

> -Bob
>
> >> Fixes: 18e5a57d79878 ("block, bfq: postpone rq preparation to insert or merge")
> >> Reported-by: Hsin-Yi Wang <hsinyi@xxxxxxxxxx>
> >> Cc: Hsin-Yi Wang <hsinyi@xxxxxxxxxx>
> >> Cc: Nicolas Boichat <drinkcat@xxxxxxxxxxxx>
> >> Cc: Doug Anderson <dianders@xxxxxxxxxxxx>
> >> Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx>
> >> ---
> >> block/bfq-iosched.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> >> index 72860325245a..56f3f4227010 100644
> >> --- a/block/bfq-iosched.c
> >> +++ b/block/bfq-iosched.c
> >> @@ -5417,7 +5417,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>
> >> spin_lock_irq(&bfqd->lock);
> >> bfqq = bfq_init_rq(rq);
> >> - if (at_head || blk_rq_is_passthrough(rq)) {
> >> + if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
> >> if (at_head)
> >> list_add(&rq->queuelist, &bfqd->dispatch);
> >> else
> >> --
> >> 2.7.4
> >>
>