Re: [PATCH v5] acpi: Fix CPU hot removal problem

From: Bjorn Helgaas
Date: Fri Sep 30 2011 - 01:21:18 EST


On Wed, Sep 28, 2011 at 9:41 PM, canquan.shen <shencanquan@xxxxxxxxxx> wrote:
>
> On 2011/9/29 11:14, Chen Gong wrote:
>>
>> ä 2011/9/24 14:10, canquan.shen åé:
>>>
>>> We run linux as a guest in Xen environment. When we used the xen tools
>>> (xm vcpu-set <n>) to hot add and remove vcpu to and from the guest, we
>>> encountered the failure on vcpu removal. We found the reason is that it
>>> did't go to really remove cpu in the cpu removal code path.
>>>
>>> This patch adds acpi_bus_hot_remove_device in
>>> acpi_process_hotplug_notify to
>>> fix this issue. With this patch, it works fine for us.
>>>
>>> Signed-off-by: Canquan Shen <shencanquan@xxxxxxxxxx>
>>> ---
>>> drivers/acpi/processor_driver.c | 13 +------------
>>> drivers/acpi/scan.c | 4 ++--
>>> include/acpi/acpi_bus.h | 1 +
>>> 3 files changed, 4 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/acpi/processor_driver.c
>>> b/drivers/acpi/processor_driver.c
>>> index a4e0f1b..8429688 100644
>>> --- a/drivers/acpi/processor_driver.c
>>> +++ b/drivers/acpi/processor_driver.c
>>> @@ -665,18 +665,7 @@ static void
>>> acpi_processor_hotplug_notify(acpi_handle handle,
>>> case ACPI_NOTIFY_EJECT_REQUEST:
>>> ACPI_DEBUG_PRINT((ACPI_DB_INFO,
>>> "received ACPI_NOTIFY_EJECT_REQUEST\n"));
>>> -
>>> - if (acpi_bus_get_device(handle, &device)) {
>>> - printk(KERN_ERR PREFIX
>>> - "Device don't exist, dropping EJECT\n");
>>> - break;
>>> - }
>>> - pr = acpi_driver_data(device);
>>> - if (!pr) {
>>> - printk(KERN_ERR PREFIX
>>> - "Driver data is NULL, dropping EJECT\n");
>>> - return;
>>> - }
>>> + acpi_bus_hot_remove_device(handle);
>>
>> As the description in __acpi_os_execute(in acpi_os_hotplug_execute),
>> /*
>> * We can't run hotplug code in keventd_wq/kacpid_wq/kacpid_notify_wq
>> * because the hotplug code may call driver .remove() functions,
>> * which invoke flush_scheduled_work/acpi_os_wait_events_complete
>> * to flush these workqueues.
>> */
>> If so, why not using following call:
>>
>> acpi_os_hotplug_execute(acpi_bus_hot_remove_device, acpi_device->handle);
>>
>>
>
> It is ok if using the acpi_os_hotplug_execute. but it is complex and more time for removal cpu because it is add to queue and some time the work will be called.
> I think that it is clear to call directly acpi_bus_hot_remove_device function in acpi_processor_hotplug_notify.

Chen, you're right that the CPU hot-remove notifier is running in a
workqueue, we are proposing that the notifier call
acpi_bus_hot_remove_device(), which calls the .remove() method, and
that can cause a deadlock if .remove() waits for the workqueue to be
flushed.

The usual way this deadlock happens is when the .remove() method uses
acpi_remove_notify_handler(). But the processor_driver .remove()
method doesn't do anything with notify handler registration; the
handler is registered/unregistered by the module init/exit functions.
So I don't think we need to use acpi_os_hotplug_execute() in this case
because I don't think there's a risk of deadlock.

However, if new CPU devices appear in the namespace after module-init,
I don't think processor_driver will handle them correctly. This looks
like just another artifact of our screwed-up ACPI hotplug handling.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/