Re: [PATCH RFC] blk_mq: clear rq mapping in driver tags before freeing rqs in sched tags

From: yukuai (C)
Date: Tue Aug 17 2021 - 23:14:59 EST


On 2021/08/18 10:45, Ming Lei wrote:

Please take a look at blk_mq_clear_rq_mapping():

//drv_tags points to set->tags[] which is shared in host wide
struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
...

//tags points to sched_tags
list_for_each_entry(page, &tags->page_list, lru) {
unsigned long start = (unsigned long)page_address(page);
unsigned long end = start + order_to_size(page->private);
int i;

/* clear drv_tags->rq[i] in case it is from this sched tags*/
for (i = 0; i < set->queue_depth; i++) {
struct request *rq = drv_tags->rqs[i];
unsigned long rq_addr = (unsigned long)rq;

if (rq_addr >= start && rq_addr < end) {
WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
cmpxchg(&drv_tags->rqs[i], rq, NULL);
}
}
}

So we do clear tags->rq[] instead of sched_tag->rq[].

Thanks for the correction, I misunderstand the code, my bad...



After switching elevator, tags->rq[tag] still point to the request
that is just freed.

No.


3) nbd server send a reply with random tag directly:

recv_work
nbd_read_stat
blk_mq_tag_to_rq(tags, tag)
rq = tags->rq[tag] -> rq is freed

Usually, nbd will get tag and send a request to server first, and then
handle the reply. However, if the request is skipped, such uaf problem
can be triggered.

When or how is such reply with random tag replied to nbd client? Is it
possible for nbd client to detect such un-expected/bad situation?

We see that the random tag replied to nbd client in our syzkaller test,
I guess it will not happen in the common case.

nbd_read_stat() is trying to get a valid request from the tag and
complete the request since the server send such reply to client.

There is a way that I can think of to detect the situation that server
reply to client without client's request:

1) get tag from the reply message

2) check that the tag is really set in bitmap of nbd->tag_set.tags[]
If the client didn't send the request message, the driver_tag is not
accquired yet, thus this check can detect this situation.

3) call blk_mq_tag_to_dile to get the request

What if blk_mq_tag_to_rq() is just called before/when we clear tags->rq[]?

The concurrent scenario is still possible currently.

Thanks
Kuai