Re: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

From: Sasha Levin
Date: Thu Jan 31 2019 - 10:20:03 EST


On Tue, Jan 29, 2019 at 07:20:28PM +0000, Dexuan Cui wrote:
From: Kimberly Brown <kimbrownkd@xxxxxxxxx>
> ...
> But as you pointed, at least for sub-channels, channel->ringbuffer_page
> can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> the "attribute->show()" could crash when the race happens.
> Adding channel_mutex here seems to be able to fix the race for
> sub-channels, as the same channel_mutex is used in
vmbus_disconnect_ring().
>
> For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> free channel->ringbuffer_page without notifying the "attribute->show()".
> We may also need to acquire/release the channel_mutex in vmbus_close()?
>
> > Actually, there should be checks that "chan" is not null and that
> > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > need to fix that.
> I suppose "chan" can not be NULL here (see the above).
>
> Checking "chan->state" may not help to completely fix the race, because
> there is no locking/synchronization code in
> vmbus_close_internal() when we test and change "channel->state".
>

The calls to vmbus_close_internal() for the subchannels and the primary
channel are protected with channel_mutex in vmbus_disconnect_ring().
This prevents "channel->state" from changing while "attribute->show()" is
running.
Ah, I think you're right.

> I guess we may need to check if channel->ringbuffer_page is NULL in
> the "attribute->show()".
>

For the primary channel, vmbus_free_ring() is called after the
return from vmbus_disconnect_ring(). Therefore, the primary channel's
state is changed before "channel->ringbuffer_page" is set to NULL.
Checking the channel state should be sufficient to prevent the ring
buffers from being freed while "attribute->show()" is running. The
ring buffers can't be freed until the channel's state is changed, and
the channel state change is protected by the mutex.
I think you're right (I noticed in a previous mail you mentioned you would
improve your patch to check "chan->state" with the mutax held).

I think checking that "channel->ringbuffer_page" is not NULL would
also work, but, as you stated, we would need to aquire/release
channel_mutex in vmbus_close().
Then it looks unnecessary to check "channel->ringbuffer_page".

> PS, to prove that a race condition does exist and can really cause a panic or
> something, I usually add some msleep() delays in different paths so that I
> can reproduce the crash every time I take a special action, e.g. when I read
> the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
> a patch can indeed help, at least it can fix the crash which would happen
> without the patch. :-)
>

Thanks! I was able to free the ring buffers while "attribute->show()"
was running, which caused a null pointer dereference bug. As expected,
the mutex lock fixed it.
Awesome!

I've queued this one for hyper-fixes, thanks all!

--
Thanks,
Sasha