Re: move hyperv CHANNELMSG_UNLOAD from crashed kernel to kdump kernel

From: Vitaly Kuznetsov
Date: Thu Dec 15 2016 - 05:54:12 EST


Olaf Hering <olaf@xxxxxxxxx> writes:

> On Thu, Dec 15, Vitaly Kuznetsov wrote:
>
>> I see a number of minor but at least one major issue against such move:
>> At least for some Hyper-V versions (2012R2 for example)
>> CHANNELMSG_UNLOAD_RESPONSE is delivered to the CPU which initially sent
>> CHANNELMSG_REQUESTOFFERS and on kdump we may not have this CPU up as
>> we usually do kdump with nr_cpus=1 (and on the CPU which crashed).
>
> Since the kdump or kexec kernel will send the unload during boot I would
> expect the response to arrive where it was sent, independent from the
> number of cpus.
>

We actually need to read the reply and empty the message slot to make
unload happen. And reading on a different CPU may not work, see:

http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2016-December/097330.html

>> Minor issue is the necessity preserve the information about
>> message/events pages across kexec.
>
> I guess this info is stored somewhere, and the relevant gfns can be
> preserved across kernels, if we try really hard.
>
> But after looking further at the involved code paths it seems that the
> implemnted polling might be good enough to snatch the response. Was the
> mdelay(10) just an arbitrary decision?

I observed delays up to several seconds (!) before
CHANNELMSG_UNLOAD_RESPONSE is delivered.

> I interpret the comments in vmbus_signal_eom such that the host may
> overwrite the response. Perhaps such thing may happen during the mdelay?

No, (at least in theory) the host is never supposed to overwrite
messages, it waits for the guest to clean the slot and do wrmsr.

--
Vitaly