Re: [PATCH] PCI/switchtec: Fix init_completion race condition with poll_wait()

From: Logan Gunthorpe
Date: Mon Mar 16 2020 - 21:25:19 EST




On 2020-03-16 6:56 p.m., Thomas Gleixner wrote:
> Logan,
>
> Logan Gunthorpe <logang@xxxxxxxxxxxx> writes:
>
>> The call to init_completion() in mrpc_queue_cmd() can theoretically
>> race with the call to poll_wait() in switchtec_dev_poll().
>>
>> poll() write()
>> switchtec_dev_poll() switchtec_dev_write()
>> poll_wait(&s->comp.wait); mrpc_queue_cmd()
>> init_completion(&s->comp)
>> init_waitqueue_head(&s->comp.wait)
>
> just a nitpick. As you took the liberty to copy the description of the
> race, which was btw. disovered by me, verbatim from a changelog written
> by someone else w/o providing the courtesy of linking to that original
> analysis, allow me the liberty to add the missing link:
>
> Link: https://lore.kernel.org/lkml/20200313174701.148376-4-bigeasy@xxxxxxxxxxxxx

Well, I just copied the call chain. I had no way to know you were the
one who discovered the bug given the way it was presented to me. And the
original patch didn't include much in the way of analysis of the bug,
just "It's Racy".

I didn't deliberately omit the link, it just never occurred to me to add
it. In retrospect, I should have included it, sorry about that.

>> To my knowledge, no one has hit this bug, but we should fix it for
>> correctness.
>
> s/,but we should fix/.Fix/ ?

Yes, that's an improvement.

>> Fix this by using reinit_completion() instead of init_completion() in
>> mrpc_queue_cmd().
>>
>> Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
>> Reported-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
>> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
>
> Acked-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Thanks.

> @Bjorn: Can you please hold off on this for a few days until we sorted
> out the remaining issues to avoid potential merge conflicts
> vs. the completion series?

I'd suggest simply rebasing the completion patch on this patch, or a
patch like it. Then we'll have the proper bug fix commit and there won't
be a conflict.

Logan