RE: ** POTENTIAL FRAUD ALERT - RED HAT ** [PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload
From: Vitaly Kuznetsov
Date: Wed Apr 21 2021 - 04:25:01 EST
Michael Kelley <mikelley@xxxxxxxxxxxxx> writes:
> From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> Sent: Tuesday, April 20, 2021 2:32 AM
>>
>> Michael Kelley <mikelley@xxxxxxxxxxxxx> writes:
>>
>> > When running in Azure, disks may be connected to a Linux VM with
>> > read/write caching enabled. If a VM panics and issues a VMbus
>> > UNLOAD request to Hyper-V, the response is delayed until all dirty
>> > data in the disk cache is flushed. In extreme cases, this flushing
>> > can take 10's of seconds, depending on the disk speed and the amount
>> > of dirty data. If kdump is configured for the VM, the current 10 second
>> > timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD
>> > complete message may arrive well after the kdump kernel is already
>> > running, causing problems. Note that no problem occurs if kdump is
>> > not enabled because Hyper-V waits for the cache flush before doing
>> > a reboot through the BIOS/UEFI code.
>> >
>> > Fix this problem by increasing the timeout in vmbus_wait_for_unload()
>> > to 100 seconds. Also output periodic messages so that if anyone is
>> > watching the serial console, they won't think the VM is completely
>> > hung.
>> >
>> > Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload")
>> > Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
>> > ---
>> >
>> > Changed in v2: Fixed silly error in the argument to mdelay()
>> >
>> > ---
>> > drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
>> > 1 file changed, 25 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> > index f3cf4af..ef4685c 100644
>> > --- a/drivers/hv/channel_mgmt.c
>> > +++ b/drivers/hv/channel_mgmt.c
>> > @@ -755,6 +755,12 @@ static void init_vp_index(struct vmbus_channel *channel)
>> > free_cpumask_var(available_mask);
>> > }
>> >
>> > +#define UNLOAD_DELAY_UNIT_MS 10 /* 10 milliseconds */
>> > +#define UNLOAD_WAIT_MS (100*1000) /* 100 seconds */
>> > +#define UNLOAD_WAIT_LOOPS (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS)
>> > +#define UNLOAD_MSG_MS (5*1000) /* Every 5 seconds */
>> > +#define UNLOAD_MSG_LOOPS (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS)
>> > +
>> > static void vmbus_wait_for_unload(void)
>> > {
>> > int cpu;
>> > @@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void)
>> > * vmbus_connection.unload_event. If not, the last thing we can do is
>> > * read message pages for all CPUs directly.
>> > *
>> > - * Wait no more than 10 seconds so that the panic path can't get
>> > - * hung forever in case the response message isn't seen.
>> > + * Wait up to 100 seconds since an Azure host must writeback any dirty
>> > + * data in its disk cache before the VMbus UNLOAD request will
>> > + * complete. This flushing has been empirically observed to take up
>> > + * to 50 seconds in cases with a lot of dirty data, so allow additional
>> > + * leeway and for inaccuracies in mdelay(). But eventually time out so
>> > + * that the panic path can't get hung forever in case the response
>> > + * message isn't seen.
>>
>> I vaguely remember debugging cases when CHANNELMSG_UNLOAD_RESPONSE never
>> arrives, it was kind of pointless to proceed to kexec as attempts to
>> reconnect Vmbus devices were failing (no devices were offered after
>> CHANNELMSG_REQUESTOFFERS AFAIR). Would it maybe make sense to just do
>> emergency reboot instead of proceeding to kexec when this happens? Just
>> wondering.
>>
>
> Yes, I think there have been (and maybe still are) situations where we don't
> ever get the UNLOAD response. But there have been bugs fixed in Hyper-V
> that I think make that less likely. There's also an unfixed (and maybe not fixable)
> problem when not operating in STIMER Direct Mode, where an old-style
> timer message can block the UNLOAD response message. But as the world
> moves forward to later kernel versions that use STIMER Direct Mode, that
> also becomes less likely. So my inclination is to let execution continue on
> the normal execution path, even if the UNLOAD response message isn't
> received. Maybe we just didn't wait quite long enough (even at 100 seconds).
> It's a judgment call, and it's not clear to me that doing an emergency reboot
> is really any better.
>
> As background work for this patch, we also discovered another bug in Hyper-V.
> If the kdump kernel runs and does a VMbus INITIATE_CONTACT while the
> UNLOAD is still in progress, the Hyper-V code is supposed to wait for the UNLOAD
> to complete, and then commence the VMbus version negotiation. But it
> doesn't do that -- it finally sends the UNLOAD response, but never does the
> version negotiation, so the kdump kernel hangs forever. The Hyper-V team
> plans to fix this, and hopefully we'll get a patch deployed in Azure, which
> will eliminate one more scenario where the kdump kernel doesn't succeed.
>
Ah, ok, if bugs in Hyper-V/Azure are being fixed then it seems
reasonable to keep the current logic (proceeding to kexec even when we
didn't receive UNLOAD). Thanks for the additional info!
--
Vitaly