Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

From: Radim Krcmar
Date: Fri Mar 18 2016 - 11:20:49 EST


2016-03-18 13:33+0100, Vitaly Kuznetsov:
> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
> delivered to CPU0 regardless of what CPU we're sending CHANNELMSG_UNLOAD
> from. vmbus_wait_for_unload() doesn't account for the fact that in case
> we're crashing on some other CPU and CPU0 is still alive and operational
> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
> vmbus_connection.unload_event, our wait on the current CPU will never
> end.

(Any chance of learning about this behavior from the spec?)

> Do the following:
> 1) Check for completion_done() in the loop. In case interrupt handler is
> still alive we'll get the confirmation we need.
>
> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE will be
> delivered there. We can race with still-alive interrupt handler doing
> the same but we don't care as we're checking completion_done() now.

(Yeah, seems better than hv_setup_vmbus_irq(NULL) or other hacks.)

> 3) Cleanup message pages on all CPUs. This is required (at least for the
> current CPU as we're clearing CPU0 messages now but we may want to bring
> up additional CPUs on crash) as new messages won't be delivered till we
> consume what's pending. On boot we'll place message pages somewhere else
> and we won't be able to read stale messages.

What if HV already set the pending message bit on current message,
do we get any guarantees that clearing once after UNLOAD_RESPONSE is
enough?

> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> ---

I had a question about NULL below. (Parenthesised rants aren't related
to r-b tag. ;)

> drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index b10e8f74..5f37057 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
>
> static void vmbus_wait_for_unload(void)
> {
> - int cpu = smp_processor_id();
> - void *page_addr = hv_context.synic_message_page[cpu];
> + int cpu;
> + void *page_addr = hv_context.synic_message_page[0];
> struct hv_message *msg = (struct hv_message *)page_addr +
> VMBUS_MESSAGE_SINT;
> struct vmbus_channel_message_header *hdr;
> bool unloaded = false;
>
> - while (1) {
> + /*
> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. When we're
> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
> + * still functional and vmbus_unload_response() will complete
> + * vmbus_connection.unload_event. If not, the last thing we can do is
> + * read message page for CPU0 regardless of what CPU we're on.
> + */
> + while (!unloaded) {

(I'd feel a bit safer if this was bounded by some timeout, but all
scenarios where this would make a difference are unplausible ...
queue_work() not working while the rest is fine is the best one.)

> + if (completion_done(&vmbus_connection.unload_event)) {
> + unloaded = true;

(No need to set unloaded when you break.)

> + break;
> + }
> +
> if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
> mdelay(10);
> continue;
> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)

(I'm not a huge fan of the unloaded variable; what about remembering the
header/msgtype here ...

> unloaded = true;
>
> vmbus_signal_eom(msg);

and checking its value here?)

> + }
>
> - if (unloaded)
> - break;
> + /*
> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
> + * maybe-pending messages on all CPUs to be able to receive new
> + * messages after we reconnect.
> + */
> + for_each_online_cpu(cpu) {
> + page_addr = hv_context.synic_message_page[cpu];

Can't this be NULL?

> + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
> + msg->header.message_type = HVMSG_NONE;
> }

(And, this block belongs to a separate function. ;])