Re: [PATCH v4] acpi: Fix CPU hot removal problem

From: Bjorn Helgaas
Date: Fri Sep 23 2011 - 10:16:35 EST


On Fri, Sep 23, 2011 at 1:49 AM, canquan.shen <shencanquan@xxxxxxxxxx> wrote:
> On 2011/9/23 0:53, Bjorn Helgaas wrote:
>>
>> On Wed, Sep 14, 2011 at 8:56 PM, Bjorn Helgaas<bhelgaas@xxxxxxxxxx>
>>  wrote:
>>>
>>> On Wed, Sep 14, 2011 at 7:06 PM, canquan.shen<shencanquan@xxxxxxxxxx>
>>>  wrote:
>>>>
>>>> We run linux as a guest in Xen environment. When we used the xen tools
>>>> (xm vcpu-set<n>) to hot add and remove vcpu to and from the guest, we
>>>> encountered the failure on vcpu removal. We found the reason is that it
>>>> didn't go to really remove cpu in the cpu removal code path.
>>>>
>>>> This patch adds acpi_bus_trim in acpi_process_hotplug_notify to fix this
>>>> issue. With this patch, it works fine for us.
>>>>
>>>> Signed-off-by:Canquan Shen<shencanquan@xxxxxxxxxx>
>>>
>>> Reviewed-by: Bjorn Helgaas<bhelgaas@xxxxxxxxxx>
>>
>> On second thought, let's think about this a bit more.
>>
>> As I mentioned before, I have a long-term goal to move the hotplug
>> flow out of drivers and into the ACPI core.  That will be easier if
>> the code in the drivers is as generic as possible.
>>
>> The dock and acpiphp hot-remove code calls acpi_bus_trim(), then
>> evaluates _EJ0.  The core acpi_bus_hot_remove_device() function
>> already does both acpi_bus_trim() and _EJ0.  This function is
>> currently only used when we write to sysfs "eject" files, but I wonder
>> if we should use it in acpi_processor_hotplug_notify() as well.
>>
>> That would get us one step closer to removing this gunk from the
>> drivers and having acpi_bus_notify() look something like this:
>>
>>     case ACPI_NOTIFY_EJECT_REQUEST:
>>         driver->ops.remove(device);
>>         acpi_bus_hot_remove_device(device);
>>         break;
>>
>> There is a description of a CPU hot-remove that does include _EJ0
>> methods in the "DIG64 Hot-Plug&  Partitioning Flows Specification"
>> [1], sec 2.2.4.  I know this document is Itanium-oriented, but this
>> part seems fairly generic and it's the only description of the process
>> I've seen so far.
>>
>> So would using acpi_bus_hot_remove_device() instead of acpi_bus_trim()
>> also solve your problem, Canquan?
>>
> Yes. It can solve my problem.
> I fully aggree to replace acpi_bus_hot_remove_device() to acpi_bus_trim().
> Initially I insert the acpi_bus_hot_remove_device() in acpi_bus_notify
> function . lately I think I should give a chance for user,and so send
>  KOBJ_OFFLINE message to the udvev module.
>
> But why add the  driver->ops.remove(device) before
> acpi_bus_hot_remove_device(device).  it can be called in
> acpi_bus_hot_remove_device code path as bellowing:
>    acpi_bus_trim
>        acpi_bus_remove
>            device_release_driver
>                __device_release_driver
>                        acpi_device_remove
>                                acpi_drv->ops.remove

OK. Maybe the future acpi_bus_notify() code would be even simpler.
The point is that the ACPI core should handle the notification, call
the driver's .remove() method, and do whatever namespace cleanup is
required (i.e., acpi_bus_trim()). None of this should be in the
driver itself.

Can you re-post your patch, using acpi_bus_hot_remove_device() instead
of acpi_bus_trim()? Please include Khalid's tweak, too, so we don't
print warnings for CPUs that don't supply _EJ0 methods.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/