Re: [PATCH v2 1/4] powerpc/mobility: Wait for memory transfer to complete

From: Nathan Lynch
Date: Tue Jun 21 2022 - 12:52:44 EST


Laurent Dufour <ldufour@xxxxxxxxxxxxx> writes:

> In pseries_migration_partition(), loop until the memory transfer is
> complete. This way the calling drmgr process will not exit earlier,
> allowing callbacks to be run only once the migration is fully completed.
>
> If reading the VASI state is done after the hypervisor has completed the
> migration, the HCALL is returning H_PARAMETER. We can safely assume that
> the memory transfer is achieved if this happens.
>
> This will also allow to manage the NMI watchdog state in the next commits.
>
> Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxx>
> ---
> arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 78f3f74c7056..179bbd4ae881 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
> return ret;
> }
>
> +static void wait_for_vasi_session_completed(u64 handle)
> +{
> + unsigned long state = 0;
> + int ret;
> +
> + pr_info("waiting for memory transfert to complete...\n");
> + /*
> + * Wait for transition from H_VASI_RESUMED to
> + * H_VASI_COMPLETED. Treat anything else as an error.

"Treat anything else as an error" does not match the code since there is
a special case for when the stream handle has expired. So that should be
dropped from this comment.

> + */
> + while (true) {
> + ret = poll_vasi_state(handle, &state);
> +
> + /*
> + * If the memory transfer is already complete and the migration
> + * has been cleaned up by the hypervisor, H_PARAMETER is return,
> + * which is translate in EINVAL by poll_vasi_state().
> + */
> + if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
> + pr_info("memory transfert completed.\n");
> + break;
> + }
> +
> + if (ret) {
> + pr_err("H_VASI_STATE return error (%d)\n", ret);
> + break;
> + }
> +
> + if (state != H_VASI_RESUMED) {
> + pr_err("unexpected H_VASI_STATE result %lu\n", state);
> + break;
> + }
> +
> + msleep(500);
> + }
> +}
> +
> static void prod_single(unsigned int target_cpu)
> {
> long hvrc;
> @@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
> vas_migration_handler(VAS_SUSPEND);
>
> ret = pseries_suspend(handle);
> - if (ret == 0)
> + if (ret == 0) {
> post_mobility_fixup();
> - else
> + wait_for_vasi_session_completed(handle);
> + } else
> pseries_cancel_migration(handle, ret);
>
> vas_migration_handler(VAS_RESUME);

While this may noticeably lengthen the time it takes for drmgr to return
from the system call, it seems like the right thing to do. The migration
should not be considered complete until the VASI stream poll yields a
"Complete" status. And we still need to add code to send gratuitous ARPs
through ibmveth interfaces while waiting for the transition, which would
likely build on this.

I believe the HMC and associated software can cope with the drmgr
command taking a longer time to return in cases where the partition
memory needs a while to completely sync to the destination.

Apart from the small critique on the comment in
wait_for_vasi_session_completed(), this looks fine to me.

Reviewed-by: Nathan Lynch <nathanl@xxxxxxxxxxxxx>