Re: [PATCH V1 1/2] Xen acpi memory hotplug driver

From: Konrad Rzeszutek Wilk
Date: Wed Dec 05 2012 - 12:43:07 EST


On Fri, Nov 30, 2012 at 03:08:02AM +0000, Liu, Jinsong wrote:
> Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 21, 2012 at 11:45:04AM +0000, Liu, Jinsong wrote:
> >>> From 630c65690c878255ce71e7c1172338ed08709273 Mon Sep 17 00:00:00
> >>> 2001
> >> From: Liu Jinsong <jinsong.liu@xxxxxxxxx>
> >> Date: Tue, 20 Nov 2012 21:14:37 +0800
> >> Subject: [PATCH 1/2] Xen acpi memory hotplug driver
> >>
> >> Xen acpi memory hotplug consists of 2 logic components:
> >> Xen acpi memory hotplug driver and Xen hypercall.
> >>
> >> This patch implement Xen acpi memory hotplug driver. When running
> >> under xen platform, Xen driver will early occupy (so native driver
> >
> > How will it 'early occupy'? Can you spell it out here please?
>
> Sure, will add it like
> 'When running under xen platform, at booting stage xen memory hotplug driver will early occupy via subsys_initcall (earlier than native module_init), so xen driver will take effect and native driver will be blocked'.

OK.
>
> >
> >> will be blocked). When acpi memory notify OSPM, xen driver will take
> >> effect, adding related memory device and parsing memory information.
> >>
> >> Signed-off-by: Liu Jinsong <jinsong.liu@xxxxxxxxx>
> >> ---
> >> drivers/xen/Kconfig | 11 +
> >> drivers/xen/Makefile | 1 +
> >> drivers/xen/xen-acpi-memhotplug.c | 383
> >> +++++++++++++++++++++++++++++++++++++ 3 files changed, 395
> >> insertions(+), 0 deletions(-) create mode 100644
> >> drivers/xen/xen-acpi-memhotplug.c
> >>
> >> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> >> index 126d8ce..abd0396 100644
> >> --- a/drivers/xen/Kconfig
> >> +++ b/drivers/xen/Kconfig
> >> @@ -206,4 +206,15 @@ config XEN_MCE_LOG
> >> Allow kernel fetching MCE error from Xen platform and
> >> converting it into Linux mcelog format for mcelog tools
> >>
> >> +config XEN_ACPI_MEMORY_HOTPLUG
> >> + bool "Xen ACPI memory hotplug"
> >
> > There should be a way to make this a module.
>
> I have some concerns to make it a module:
> 1. xen and native memhotplug driver both work as module, while we need early load xen driver.
> 2. if possible, a xen stub driver may solve load sequence issue, but it may involve other issues
> * if xen driver load then unload, native driver may have chance to load successfully;

The stub driver would still "occupy" the ACPI bus for the memory hotplug PnP, so
I think this would not be a problem.

> * if xen driver load --> unload --> load again, then it will lose hotplug notification during unload period;

Sure. But I think we can do it with this driver? After all the function of
it is to just tell the firmware to turn on/off sockets - and if we miss
one notification we won't take advantage of the power savings - but we
can do that later on.


> * if xen driver load --> unload --> load again, then it will re-add all memory devices, but the handle for 'booting memory device' and 'hotplug memory device' are different while we have no way to distinguish these 2 kind of devices.

Wouldn't the stub driver hold onto that?

>
> IMHO I think to make xen hotplug logic as module may involves unexpected result. Is there any obvious advantages of doing so? after all we have provided config choice to user. Thoughts?

Yes, it becomes a module - which is what we want.

>
> >
> >
> >> + depends on XEN_DOM0 && X86_64 && ACPI
> >> + default n
> >> + help
> >> + This is Xen acpi memory hotplug.
> > ^^^^ -> ACPI
> >
> >> +
> >> + Currently Xen only support acpi memory hot-add. If you want
> > ^^^^-> ACPI
> >
> >> + to hot-add memory at runtime (the hot-added memory cannot be
> >> + removed until machine stop), select Y here, otherwise select N. +
> >> endmenu
> >> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> >> index 7435470..c339eb4 100644
> >> --- a/drivers/xen/Makefile
> >> +++ b/drivers/xen/Makefile
> >> @@ -30,6 +30,7 @@ obj-$(CONFIG_XEN_MCE_LOG) += mcelog.o
> >> obj-$(CONFIG_XEN_PCIDEV_BACKEND) += xen-pciback/
> >> obj-$(CONFIG_XEN_PRIVCMD) += xen-privcmd.o
> >> obj-$(CONFIG_XEN_ACPI_PROCESSOR) += xen-acpi-processor.o
> >> +obj-$(CONFIG_XEN_ACPI_MEMORY_HOTPLUG) += xen-acpi-memhotplug.o
> >> xen-evtchn-y := evtchn.o xen-gntdev-y := gntdev.o
> >> xen-gntalloc-y := gntalloc.o
> >> diff --git a/drivers/xen/xen-acpi-memhotplug.c
> >> b/drivers/xen/xen-acpi-memhotplug.c new file mode 100644 index
> >> 0000000..f0c7990 --- /dev/null
> >> +++ b/drivers/xen/xen-acpi-memhotplug.c
> >> @@ -0,0 +1,383 @@
> >> +/*
> >> + * Copyright (C) 2012 Intel Corporation
> >> + * Author: Liu Jinsong <jinsong.liu@xxxxxxxxx>
> >> + * Author: Jiang Yunhong <yunhong.jiang@xxxxxxxxx> + *
> >> + * This program is free software; you can redistribute it and/or
> >> modify + * it under the terms of the GNU General Public License as
> >> published by + * the Free Software Foundation; either version 2 of
> >> the License, or (at + * your option) any later version.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> but + * WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE
> >> or + * NON INFRINGEMENT. See the GNU General Public License for
> >> more + * details. + */
> >> +
> >> +#include <linux/kernel.h>
> >> +#include <linux/init.h>
> >> +#include <linux/types.h>
> >> +#include <acpi/acpi_drivers.h>
> >> +
> >> +#define ACPI_MEMORY_DEVICE_CLASS "memory"
> >> +#define ACPI_MEMORY_DEVICE_HID "PNP0C80"
> >> +#define ACPI_MEMORY_DEVICE_NAME "Hotplug Mem Device"
> >
> > Weird tabs?
> >
>
> It ported from native and seems right tabs? will double check.
>
> >> +
> >> +#undef PREFIX
> >
> > Why the #undef ?
> >> +#define PREFIX "ACPI:memory_hp:"
> >
> >
> > Not "ACPI:memory_xen:" ?
>
> OK, how about more detailed "ACPI:xen_memory_hotplug:"?

Sure.
>
> >
> >
> >> +
> >> +static int acpi_memory_device_add(struct acpi_device *device);
> >> +static int acpi_memory_device_remove(struct acpi_device *device,
> >> int type); + +static const struct acpi_device_id memory_device_ids[]
> >> = { + {ACPI_MEMORY_DEVICE_HID, 0},
> >> + {"", 0},
> >> +};
> >> +
> >> +static struct acpi_driver acpi_memory_device_driver = {
> >> + .name = "acpi_memhotplug",
> >
> > Not 'xen_acpi_memhotplug' ?
>
> No, here driver name (same as native driver name) used to block native driver loading.

Then you need a comment in the file explaining that.

>
> >
> >> + .class = ACPI_MEMORY_DEVICE_CLASS,
> >> + .ids = memory_device_ids,
> >> + .ops = {
> >> + .add = acpi_memory_device_add,
> >> + .remove = acpi_memory_device_remove,
> >
> > Just for sake of clarity I would prefix those with 'xen_'.
>
> OK.
>
> >
> >> + },
> >> +};
> >> +
> >> +struct acpi_memory_info {
> >> + struct list_head list;
> >> + u64 start_addr; /* Memory Range start physical addr */
> >> + u64 length; /* Memory Range length */
> >> + unsigned short caching; /* memory cache attribute */
> >> + unsigned short write_protect; /* memory read/write attribute */
> >
> > Can't the write_protect by a bit field like the 'enabled'? So
> > unsigned int write_protect:1;
> > ?
>
> Seems not good, write_protect copied from an acpi buffer (byte3) getting from _CRS evaluation.

Ah, pls put a comment in there as well why that cannot be done.

>
> >> + unsigned int enabled:1;
> >> +};
> >> +
> >> +struct acpi_memory_device {
> >> + struct acpi_device *device;
> >> + struct list_head res_list;
> >> +};
> >> +
> >> +static int acpi_hotmem_initialized;
> >
> > Just make it a bool and also use __read_mostly please.
>
> OK.
>
> >
> >> +
> >> +
> >> +int xen_acpi_memory_enable_device(struct acpi_memory_device
> >> *mem_device) +{ + return 0;
> >> +}
> >
> > Why even have this function if it does not do anything?
>
> Not a nop, it implemented at patch 2/2.

Yup, saw it in the next patch.
>
> >
> >> +
> >> +static acpi_status
> >> +acpi_memory_get_resource(struct acpi_resource *resource, void
> >> *context) +{ + struct acpi_memory_device *mem_device = context;
> >> + struct acpi_resource_address64 address64;
> >> + struct acpi_memory_info *info, *new;
> >> + acpi_status status;
> >> +
> >> + status = acpi_resource_to_address64(resource, &address64); + if
> >> (ACPI_FAILURE(status) || + (address64.resource_type !=
> >> ACPI_MEMORY_RANGE)) + return AE_OK; +
> >> + list_for_each_entry(info, &mem_device->res_list, list) {
> >> + /* Can we combine the resource range information? */
> >
> > I don't know? Is this is a future TODO?
>
> I'm also not quite sure, this comments ported from native side.

OK, pls find out. Perhaps this comment is stale.

>
> >
> >> + if ((info->caching == address64.info.mem.caching) &&
> >> + (info->write_protect == address64.info.mem.write_protect) &&
> >> + (info->start_addr + info->length == address64.minimum)) {
> >> + info->length += address64.address_length;
> >> + return AE_OK;
> >> + }
> >> + }
> >> +
> >> + new = kzalloc(sizeof(struct acpi_memory_info), GFP_KERNEL); + if
> >> (!new) + return AE_ERROR;
> >> +
> >> + INIT_LIST_HEAD(&new->list);
> >> + new->caching = address64.info.mem.caching;
> >> + new->write_protect = address64.info.mem.write_protect;
> >> + new->start_addr = address64.minimum;
> >> + new->length = address64.address_length;
> >> + list_add_tail(&new->list, &mem_device->res_list); +
> >> + return AE_OK;
> >> +}
> >> +
> >> +static int
> >> +acpi_memory_get_device_resources(struct acpi_memory_device
> >> *mem_device) +{ + acpi_status status;
> >> + struct acpi_memory_info *info, *n;
> >> +
> >> + if (!list_empty(&mem_device->res_list))
> >> + return 0;
> >> +
> >> + status = acpi_walk_resources(mem_device->device->handle,
> >> + METHOD_NAME__CRS, acpi_memory_get_resource, mem_device); +
> >> + if (ACPI_FAILURE(status)) {
> >> + list_for_each_entry_safe(info, n, &mem_device->res_list, list)
> >> + kfree(info); + INIT_LIST_HEAD(&mem_device->res_list);
> >> + return -EINVAL;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int
> >> +acpi_memory_get_device(acpi_handle handle,
> >> + struct acpi_memory_device **mem_device)
> >> +{
> >> + acpi_status status;
> >> + acpi_handle phandle;
> >> + struct acpi_device *device = NULL;
> >> + struct acpi_device *pdevice = NULL;
> >> + int result;
> >> +
> >> + if (!acpi_bus_get_device(handle, &device) && device) + goto end;
> >> +
> >> + status = acpi_get_parent(handle, &phandle);
> >> + if (ACPI_FAILURE(status)) {
> >> + pr_warn(PREFIX "Cannot find acpi parent\n");
> >> + return -EINVAL;
> >> + }
> >> +
> >> + /* Get the parent device */
> >> + result = acpi_bus_get_device(phandle, &pdevice);
> >> + if (result) {
> >> + pr_warn(PREFIX "Cannot get acpi bus device\n");
> >> + return -EINVAL;
> >> + }
> >> +
> >> + /*
> >> + * Now add the notified device. This creates the acpi_device
> >> + * and invokes .add function
> >> + */
> >> + result = acpi_bus_add(&device, pdevice, handle,
> >> ACPI_BUS_TYPE_DEVICE); + if (result) { + pr_warn(PREFIX "Cannot add
> >> acpi bus\n"); + return -EINVAL;
> >> + }
> >> +
> >> +end:
> >> + *mem_device = acpi_driver_data(device);
> >> + if (!(*mem_device)) {
> >> + pr_err(PREFIX "Driver data not found\n");
> >> + return -ENODEV;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int acpi_memory_check_device(struct acpi_memory_device
> >> *mem_device) +{ + unsigned long long current_status;
> >> +
> >> + /* Get device present/absent information from the _STA */
> >> + if (ACPI_FAILURE(acpi_evaluate_integer(mem_device->device->handle,
> >> + "_STA", NULL, &current_status)))
> >> + return -ENODEV;
> >> + /*
> >> + * Check for device status. Device should be
> >> + * present/enabled/functioning.
> >> + */
> >> + if (!((current_status & ACPI_STA_DEVICE_PRESENT)
> >> + && (current_status & ACPI_STA_DEVICE_ENABLED)
> >> + && (current_status & ACPI_STA_DEVICE_FUNCTIONING)))
> >> + return -ENODEV; +
> >> + return 0;
> >> +}
> >> +
> >> +static int acpi_memory_disable_device(struct acpi_memory_device
> >> *mem_device) +{ + pr_warn(PREFIX "Xen does not support memory
> >> hotremove\n"); + + return -ENOSYS;
> >> +}
> >> +
> >> +static void acpi_memory_device_notify(acpi_handle handle, u32
> >> event, void *data) +{ + struct acpi_memory_device *mem_device;
> >> + struct acpi_device *device;
> >> + u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */ +
> >> + switch (event) {
> >> + case ACPI_NOTIFY_BUS_CHECK:
> >> + ACPI_DEBUG_PRINT((ACPI_DB_INFO,
> >> + "\nReceived BUS CHECK notification for device\n")); + /* Fall
> >> Through */ + case ACPI_NOTIFY_DEVICE_CHECK:
> >> + if (event == ACPI_NOTIFY_DEVICE_CHECK)
> >> + ACPI_DEBUG_PRINT((ACPI_DB_INFO,
> >> + "\nReceived DEVICE CHECK notification for device\n")); +
> >> + if (acpi_memory_get_device(handle, &mem_device)) {
> >> + pr_err(PREFIX "Cannot find driver data\n");
> >> + break;
> >> + }
> >> +
> >> + ost_code = ACPI_OST_SC_SUCCESS;
> >> + break;
> >> +
> >> + case ACPI_NOTIFY_EJECT_REQUEST:
> >> + ACPI_DEBUG_PRINT((ACPI_DB_INFO,
> >> + "\nReceived EJECT REQUEST notification for device\n")); +
> >> + if (acpi_bus_get_device(handle, &device)) {
> >> + pr_err(PREFIX "Device doesn't exist\n");
> >> + break;
> >> + }
> >> + mem_device = acpi_driver_data(device);
> >> + if (!mem_device) {
> >> + pr_err(PREFIX "Driver Data is NULL\n");
> >> + break;
> >> + }
> >> +
> >> + /*
> >> + * TBD: implement acpi_memory_disable_device and invoke
> >> + * acpi_bus_remove if Xen support hotremove in the future + */
> >> + acpi_memory_disable_device(mem_device);
> >> + break;
> >> +
> >> + default:
> >> + ACPI_DEBUG_PRINT((ACPI_DB_INFO,
> >> + "Unsupported event [0x%x]\n", event));
> >> + /* non-hotplug event; possibly handled by other handler */
> >> + return; + }
> >> +
> >> + /* Inform firmware that the hotplug operation has completed */
> >> + (void) acpi_evaluate_hotplug_ost(handle, event, ost_code, NULL);
> >
> >
> > Hm, even if we failed? Say for the ACPI_NOTIFY_EJECT_REQUEST ?
>
> OK, let's remove this the comments 'Inform firmware that the hotplug operation has completed'
> For ACPI_NOTIFY_EJECT_REQUEST, it in fact inform firmware 'ACPI_OST_SC_NON_SPECIFIC_FAILURE'.
>
> >
> >> + return;
> >> +}
> >> +
> >> +static int acpi_memory_device_add(struct acpi_device *device) +{
> >> + int result;
> >> + struct acpi_memory_device *mem_device = NULL;
> >> +
> >> +
> >> + if (!device)
> >> + return -EINVAL;
> >> +
> >> + mem_device = kzalloc(sizeof(struct acpi_memory_device),
> >> GFP_KERNEL); + if (!mem_device) + return -ENOMEM;
> >> +
> >> + INIT_LIST_HEAD(&mem_device->res_list);
> >> + mem_device->device = device;
> >> + sprintf(acpi_device_name(device), "%s", ACPI_MEMORY_DEVICE_NAME);
> >> + sprintf(acpi_device_class(device), "%s", ACPI_MEMORY_DEVICE_CLASS);
> >> + device->driver_data = mem_device;
> >> +
> >> + /* Get the range from the _CRS */
> >> + result = acpi_memory_get_device_resources(mem_device); + if
> >> (result) { + kfree(mem_device);
> >> + return result;
> >> + }
> >> +
> >> + /*
> >> + * Early boot code has recognized memory area by EFI/E820.
> >> + * If DSDT shows these memory devices on boot, hotplug is not
> >> necessary + * for them. So, it just returns until completion of
> >> this driver's + * start up. + */
> >> + if (!acpi_hotmem_initialized)
> >> + return 0;
> >> +
> >> + if (!acpi_memory_check_device(mem_device))
> >> + result = xen_acpi_memory_enable_device(mem_device);
> >
> > This is a nop. Could you just do:
> > result = 0;
> > ?
>
> It implemented at patch 2/2.
>
> Thanks,
> Jinsong
>
> >
> >> +
> >> + return result;
> >> +}
> >> +
> >> +static int acpi_memory_device_remove(struct acpi_device *device,
> >> int type) +{ + struct acpi_memory_device *mem_device = NULL;
> >> +
> >> + if (!device || !acpi_driver_data(device))
> >> + return -EINVAL;
> >> +
> >> + mem_device = acpi_driver_data(device);
> >> + kfree(mem_device);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +/*
> >> + * Helper function to check for memory device
> >> + */
> >> +static acpi_status is_memory_device(acpi_handle handle) +{
> >> + char *hardware_id;
> >> + acpi_status status;
> >> + struct acpi_device_info *info;
> >> +
> >> + status = acpi_get_object_info(handle, &info);
> >> + if (ACPI_FAILURE(status))
> >> + return status;
> >> +
> >> + if (!(info->valid & ACPI_VALID_HID)) {
> >> + kfree(info);
> >> + return AE_ERROR;
> >> + }
> >> +
> >> + hardware_id = info->hardware_id.string;
> >> + if ((hardware_id == NULL) ||
> >> + (strcmp(hardware_id, ACPI_MEMORY_DEVICE_HID))) + status =
> >> AE_ERROR; +
> >> + kfree(info);
> >> + return status;
> >> +}
> >> +
> >> +static acpi_status
> >> +acpi_memory_register_notify_handler(acpi_handle handle,
> >> + u32 level, void *ctxt, void **retv)
> >> +{
> >> + acpi_status status;
> >> +
> >> + status = is_memory_device(handle);
> >> + if (ACPI_FAILURE(status))
> >> + return AE_OK; /* continue */
> >> +
> >> + status = acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
> >> + acpi_memory_device_notify, NULL);
> >> + /* continue */
> >> + return AE_OK;
> >> +}
> >> +
> >> +static int __init xen_acpi_memory_device_init(void) +{
> >> + int result;
> >> + acpi_status status;
> >> +
> >> + /* only dom0 is responsible for xen acpi memory hotplug */ + if
> >> (!xen_initial_domain()) + return -ENODEV;
> >> +
> >> + result = acpi_bus_register_driver(&acpi_memory_device_driver);
> >> + if (result < 0) + return -ENODEV;
> >> +
> >> + status = acpi_walk_namespace(ACPI_TYPE_DEVICE, ACPI_ROOT_OBJECT,
> >> + ACPI_UINT32_MAX, +
> >> acpi_memory_register_notify_handler, NULL, + NULL, NULL); +
> >> + if (ACPI_FAILURE(status)) {
> >> + pr_warn(PREFIX "walk_namespace failed\n");
> >> + acpi_bus_unregister_driver(&acpi_memory_device_driver); + return
> >> -ENODEV; + }
> >> +
> >> + acpi_hotmem_initialized = 1;
> >> + return 0;
> >> +}
> >> +subsys_initcall(xen_acpi_memory_device_init);
> >> --
> >> 1.7.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/