Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done

From: Marek Marczykowski-Górecki
Date: Thu Oct 28 2021 - 16:17:08 EST


On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:
> When running as PVH or HVM guest with actual memory < max memory the
> hypervisor is using "populate on demand" in order to allow the guest
> to balloon down from its maximum memory size. For this to work
> correctly the guest must not touch more memory pages than its target
> memory size as otherwise the PoD cache will be exhausted and the guest
> is crashed as a result of that.
>
> In extreme cases ballooning down might not be finished today before
> the init process is started, which can consume lots of memory.
>
> In order to avoid random boot crashes in such cases, add a late init
> call to wait for ballooning down having finished for PVH/HVM guests.
>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>

It may happen that initial balloon down fails (state==BP_ECANCELED). In
that case, it waits indefinitely. I think it should rather report a
failure (and panic? it's similar to OOM before PID 1 starts, so rather
hard to recover), instead of hanging.

Anyway, it does fix the boot crashes.

> ---
> drivers/xen/balloon.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 3a50f097ed3e..d19b851c3d3b 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -765,3 +765,23 @@ static int __init balloon_init(void)
> return 0;
> }
> subsys_initcall(balloon_init);
> +
> +static int __init balloon_wait_finish(void)
> +{
> + if (!xen_domain())
> + return -ENODEV;
> +
> + /* PV guests don't need to wait. */
> + if (xen_pv_domain() || !current_credit())
> + return 0;
> +
> + pr_info("Waiting for initial ballooning down having finished.\n");
> +
> + while (current_credit())
> + schedule_timeout_interruptible(HZ / 10);
> +
> + pr_info("Initial ballooning down finished.\n");
> +
> + return 0;
> +}
> +late_initcall_sync(balloon_wait_finish);
> --
> 2.26.2
>

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature