Re: [PATCH 1/1] Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs

From: Vitaly Kuznetsov
Date: Tue May 16 2023 - 05:27:37 EST


Michael Kelley <mikelley@xxxxxxxxxxxxx> writes:

> vmbus_wait_for_unload() may be called in the panic path after other
> CPUs are stopped. vmbus_wait_for_unload() currently loops through
> online CPUs looking for the UNLOAD response message. But the values of
> CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
> to stop the other CPUs, and in one of the paths the stopped CPUs
> are removed from cpu_online_mask. This removal happens in both
> x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
> only checks the panic'ing CPU, and misses the UNLOAD response message
> except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
> eventually times out, but only after waiting 100 seconds.
>
> Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
> The cpu_present_mask is not modified by stopping the other CPUs in the
> panic path, nor should it be. Furthermore, the synic_message_page
> being checked in vmbus_wait_for_unload() is allocated in
> hv_synic_alloc() for all present CPUs. So looping through the
> present CPUs is more consistent.
>
> For additional safety, also add a check for the message_page being
> NULL before looking for the UNLOAD response message.
>
> Reported-by: John Starks <jostarks@xxxxxxxxxxxxx>
> Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")

I see you Cc:ed stable@ on the patch, should we also add

Cc: stable@xxxxxxxxxxxxxxx

here explicitly so it gets picked up by various stable backporting
scritps? I guess Wei can do it when picking the patch to the queue...

> Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
> ---
> drivers/hv/channel_mgmt.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 007f26d..df2ba20 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -829,11 +829,14 @@ static void vmbus_wait_for_unload(void)
> if (completion_done(&vmbus_connection.unload_event))
> goto completed;
>
> - for_each_online_cpu(cpu) {
> + for_each_present_cpu(cpu) {
> struct hv_per_cpu_context *hv_cpu
> = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> page_addr = hv_cpu->synic_message_page;
> + if (!page_addr)
> + continue;
> +

In theory, synic_message_page for all present CPUs is permanently
assigned in hv_synic_alloc() and we fail the whole thing if any of these
allocations fail so page_addr == NULL is likely impossible today
but there's certainly no harm in having this extra check here, this is
not a hotpath.

> msg = (struct hv_message *)page_addr
> + VMBUS_MESSAGE_SINT;
>
> @@ -867,11 +870,14 @@ static void vmbus_wait_for_unload(void)
> * maybe-pending messages on all CPUs to be able to receive new
> * messages after we reconnect.
> */
> - for_each_online_cpu(cpu) {
> + for_each_present_cpu(cpu) {
> struct hv_per_cpu_context *hv_cpu
> = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> page_addr = hv_cpu->synic_message_page;
> + if (!page_addr)
> + continue;
> +
> msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
> msg->header.message_type = HVMSG_NONE;
> }

Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

--
Vitaly