Re: [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration
From: Boris Ostrovsky
Date: Wed Feb 05 2014 - 20:19:55 EST
----- srivatsa.bhat@xxxxxxxxxxxxxxxxxx wrote:
> Subsystems that want to register CPU hotplug callbacks, as well as
> perform
> initialization for the CPUs that are already online, often do it as
> shown
> below:
>
> get_online_cpus();
>
> for_each_online_cpu(cpu)
> init_cpu(cpu);
>
> register_cpu_notifier(&foobar_cpu_notifier);
>
> put_online_cpus();
>
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running
> concurrently
> with CPU hotplug operations).
>
> Interestingly, the balloon code in xen can actually prevent double
> initialization and hence can use the following simplified form of
> callback
> registration:
>
> register_cpu_notifier(&foobar_cpu_notifier);
>
> get_online_cpus();
>
> for_each_online_cpu(cpu)
> init_cpu(cpu);
>
> put_online_cpus();
>
> A hotplug operation that occurs between registering the notifier and
> calling
> get_online_cpus(), won't disrupt anything, because the code takes care
> to
> perform the memory allocations only once.
>
> So reorganize the balloon code in xen this way to fix the deadlock
> with
> callback registration.
>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
> ---
>
> drivers/xen/balloon.c | 35 +++++++++++++++++++++++------------
> 1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 37d06ea..afe1a3f 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned
> long start_pfn,
> }
> }
>
> +static int alloc_balloon_scratch_page(int cpu)
> +{
> + if (per_cpu(balloon_scratch_page, cpu) != NULL)
> + return 0;
> +
> + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> + if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> + pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +
> static int balloon_cpu_notify(struct notifier_block *self,
> unsigned long action, void *hcpu)
> {
> int cpu = (long)hcpu;
> switch (action) {
> case CPU_UP_PREPARE:
> - if (per_cpu(balloon_scratch_page, cpu) != NULL)
> - break;
> - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> + if (alloc_balloon_scratch_page(cpu))
> return NOTIFY_BAD;
> - }
> break;
> default:
> break;
> @@ -624,15 +634,16 @@ static int __init balloon_init(void)
> return -ENODEV;
>
> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - for_each_online_cpu(cpu)
> - {
> - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> + register_cpu_notifier(&balloon_cpu_notifier);
> +
> + get_online_cpus();
> + for_each_online_cpu(cpu) {
> + if (alloc_balloon_scratch_page(cpu)) {
> + put_online_cpus();
> return -ENOMEM;
Not that original code was doing a particularly thorough job of cleaning up on allocation failure but if it couldn't get memory it would not register the notifier. So perhaps you should unregister it before returning here.
I am also not sure how we were susceptible to the deadlock here since we didn't call get_online_cpus(). (We probably should have but then commit description should say it).
-boris
> }
> }
> - register_cpu_notifier(&balloon_cpu_notifier);
> + put_online_cpus();
> }
>
> pr_info("Initialising balloon driver\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/