Re: [PATCH] Drivers: hv: util: on deinit, don't wait the release event, if we shouldn't

From: Vitaly Kuznetsov
Date: Tue Feb 28 2017 - 07:57:20 EST


Dexuan Cui <decui@xxxxxxxxxxxxx> writes:

> If the daemon is NOT running at all, when we disable the util device from
> Hyper-V Manager (or sometimes the host can rescind a util device and then
> re-offer it), we'll hang in util_remove -> hv_kvp_deinit ->
> wait_for_completion(&release_event), because this code path doesn't run:
> hvt_op_release -> ... -> kvp_on_reset -> complete(&release_event).
>
> Due to this, we even can't reboot the VM properly.
>
> The patch tracks if the dev file is opened or not, and we only need to
> wait if it's opened.
>
> Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue")
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>
> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>
> ---
> drivers/hv/hv_fcopy.c | 5 ++++-
> drivers/hv/hv_kvp.c | 6 +++++-
> drivers/hv/hv_snapshot.c | 5 ++++-
> drivers/hv/hv_utils_transport.c | 2 ++
> drivers/hv/hv_utils_transport.h | 1 +
> 5 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> index 9aee601..545cf43 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -358,8 +358,11 @@ int hv_fcopy_init(struct hv_util_service *srv)
>
> void hv_fcopy_deinit(void)
> {
> + bool wait = hvt->dev_opened;
> +
> fcopy_transaction.state = HVUTIL_DEVICE_DYING;
> cancel_delayed_work_sync(&fcopy_timeout_work);
> hvutil_transport_destroy(hvt);
> - wait_for_completion(&release_event);
> + if (wait)
> + wait_for_completion(&release_event);

This is racy I think. We need to prevent openning the device first and
then query its state:

bool wait;

fcopy_transaction.state = HVUTIL_DEVICE_DYING;
/* make sure state is set */
mb();
wait = hvt->dev_opened;
cancel_delayed_work_sync(&fcopy_timeout_work);
hvutil_transport_destroy(hvt);
if (wait)
wait_for_completion(&release_event);

otherwise someone could open the device before we manage to update its
state.

> }
> diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
> index de26371..15c7873 100644
> --- a/drivers/hv/hv_kvp.c
> +++ b/drivers/hv/hv_kvp.c
> @@ -742,10 +742,14 @@ hv_kvp_init(struct hv_util_service *srv)
>
> void hv_kvp_deinit(void)
> {
> + bool wait = hvt->dev_opened;
> +
> kvp_transaction.state = HVUTIL_DEVICE_DYING;
> cancel_delayed_work_sync(&kvp_host_handshake_work);
> cancel_delayed_work_sync(&kvp_timeout_work);
> cancel_work_sync(&kvp_sendkey_work);
> hvutil_transport_destroy(hvt);
> - wait_for_completion(&release_event);
> +
> + if (wait)
> + wait_for_completion(&release_event);
> }
> diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
> index bcc03f0..3847f19 100644
> --- a/drivers/hv/hv_snapshot.c
> +++ b/drivers/hv/hv_snapshot.c
> @@ -396,9 +396,12 @@ hv_vss_init(struct hv_util_service *srv)
>
> void hv_vss_deinit(void)
> {
> + bool wait = hvt->dev_opened;
> +
> vss_transaction.state = HVUTIL_DEVICE_DYING;
> cancel_delayed_work_sync(&vss_timeout_work);
> cancel_work_sync(&vss_handle_request_work);
> hvutil_transport_destroy(hvt);
> - wait_for_completion(&release_event);
> + if (wait)
> + wait_for_completion(&release_event);
> }
> diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
> index c235a95..05e0648 100644
> --- a/drivers/hv/hv_utils_transport.c
> +++ b/drivers/hv/hv_utils_transport.c
> @@ -153,6 +153,7 @@ static int hvt_op_open(struct inode *inode, struct file *file)
>
> if (issue_reset)
> hvt_reset(hvt);
> + hvt->dev_opened = (hvt->mode == HVUTIL_TRANSPORT_CHARDEV) && !ret;
>
> mutex_unlock(&hvt->lock);
>
> @@ -182,6 +183,7 @@ static int hvt_op_release(struct inode *inode, struct file *file)
> * connects back.
> */
> hvt_reset(hvt);
> + hvt->dev_opened = false;
> mutex_unlock(&hvt->lock);
>

Not sure but it seems this may also be racy (what if we query the state
just before we reset it?).

> if (mode_old == HVUTIL_TRANSPORT_DESTROY)
> diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h
> index d98f522..9871283 100644
> --- a/drivers/hv/hv_utils_transport.h
> +++ b/drivers/hv/hv_utils_transport.h
> @@ -32,6 +32,7 @@ struct hvutil_transport {
> int mode; /* hvutil_transport_mode */
> struct file_operations fops; /* file operations */
> struct miscdevice mdev; /* misc device */
> + bool dev_opened; /* Is the device opened? */
> struct cb_id cn_id; /* CN_*_IDX/CN_*_VAL */
> struct list_head list; /* hvt_list */
> int (*on_msg)(void *, int); /* callback on new user message */

I think we can get away without introducing this new flag, e.g. if we
replace release_event with an atomic which will hold the state
(open/closed). This will also elimenate possible races above. I can try
prototyping a patch if you want me to.

Thanks,

--
Vitaly