Re: [PATCH] blk-mq: fix waitqueue_active without memory barrier inblock/blk-mq-tag.c

From: Kosuke Tatsukawa
Date: Sat Oct 10 2015 - 01:04:39 EST


Jens Axboe wrote:
> On 10/08/2015 06:35 PM, Kosuke Tatsukawa wrote:
>> blk_mq_tag_update_depth() seems to be missing a memory barrier which
>> might cause the waker to not notice the waiter and fail to send a
>> wake_up as in the following figure.
>>
>> blk_mq_tag_update_depth bt_get
>> ------------------------------------------------------------------------
>> if (waitqueue_active(&bs->wait))
>> /* The CPU might reorder the test for
>> the waitqueue up here, before
>> prior writes complete */
>> prepare_to_wait(&bs->wait, &wait,
>> TASK_UNINTERRUPTIBLE);
>> tag = __bt_get(hctx, bt, last_tag,
>> tags);
>> /* Value set in bt_update_count not
>> visible yet */
>> bt_update_count(&tags->bitmap_tags, tdepth);
>> /* blk_mq_tag_wakeup_all(tags, false); */
>> bt = &tags->bitmap_tags;
>> wake_index = atomic_read(&bt->wake_index);
>> ...
>> io_schedule();
>> ------------------------------------------------------------------------
>>
>> This patch adds the missing memory barrier.
>>
>> I found this issue when I was looking through the linux source code
>> for places calling waitqueue_active() before wake_up*(), but without
>> preceding memory barriers, after sending a patch to fix a similar
>> issue in drivers/tty/n_tty.c (Details about the original issue can be
>> found here: https://lkml.org/lkml/2015/9/28/849).
>>
>> Signed-off-by: Kosuke Tatsukawa <tatsu@xxxxxxxxxxxxx>
>> ---
>> block/blk-mq-tag.c | 4 ++++
>> 1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index ed96474..7a6b6e2 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -75,6 +75,10 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>> struct blk_mq_bitmap_tags *bt;
>> int i, wake_index;
>>
>> + /*
>> + * Make sure all changes prior to this are visible from other CPUs.
>> + */
>> + smp_mb();
>> bt = &tags->bitmap_tags;
>> wake_index = atomic_read(&bt->wake_index);
>> for (i = 0; i < BT_WAIT_QUEUES; i++) {
>>
>
> Thanks, after looking at this, I think this patch is fine. It's not a
> super hot path, so not worth it to further optimize this or look into
> ways to avoid the barrier. I do wonder if there are archs where
> atomic_read() is a memory barrier, in that case we need not do it at
> all. And perhaps we have some weird smp_before_bla variant that could be
> used here instead fo improve upon that case.

Roughly looking at include/asm/atomic.h in various architecures, it
seems atomic_read is defined as a macro or an inline function calling
ACCESS_ONCE((v)->counter)
in many architectures which doesn't imply a memory barrier.

blackfin seems to be calling an assembler function which does "flush
core internal write buffer".

I'm not sure about the memory ordering of the assembler instructions for
metag, powerpc and s390 though.
---
Kosuke TATSUKAWA | 3rd IT Platform Department
| IT Platform Division, NEC Corporation
| tatsu@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/