Re: [RFC] ehea: kdump support using new shutdown hook

From: Michael Ellerman
Date: Thu Dec 13 2007 - 01:50:54 EST


On Wed, 2007-12-12 at 17:53 +0100, Thomas Klein wrote:
> This patch adds kdump support using the new PPC crash shutdown hook to the
> ehea driver. The driver now keeps a list of firmware handles which have to
> be freed in case of a crash. The crash handler does the minimum required: it
> frees the firmware resource handles plus broadcast/multicast registrations.
>
> Please comment.

Hi Thomas,

Here's a few ..

> ---
> diff -Nurp -X dontdiff linux-2.6.24-rc5/drivers/net/ehea/ehea.h patched_kernel/drivers/net/ehea/ehea.h
> --- linux-2.6.24-rc5/drivers/net/ehea/ehea.h 2007-12-11 04:48:43.000000000 +0100
> +++ patched_kernel/drivers/net/ehea/ehea.h 2007-12-12 17:30:53.000000000 +0100
> @@ -386,6 +386,7 @@ struct ehea_port_res {
>
>
> #define EHEA_MAX_PORTS 16
> +#define EHEA_MAX_RES_HANDLES (100 * EHEA_MAX_PORTS + 10)
> struct ehea_adapter {
> u64 handle;
> struct of_device *ofdev;
> @@ -397,6 +398,7 @@ struct ehea_adapter {
> u64 max_mc_mac; /* max number of multicast mac addresses */
> int active_ports;
> struct list_head list;
> + u64 res_handles[EHEA_MAX_RES_HANDLES];
> };

I don't like this ..

> diff -Nurp -X dontdiff linux-2.6.24-rc5/drivers/net/ehea/ehea_main.c patched_kernel/drivers/net/ehea/ehea_main.c
> --- linux-2.6.24-rc5/drivers/net/ehea/ehea_main.c 2007-12-11 04:48:43.000000000 +0100
> +++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-12-12 17:30:53.000000000 +0100
> @@ -35,6 +35,7 @@
> #include <linux/if_ether.h>
> #include <linux/notifier.h>
> #include <linux/reboot.h>
> +#include <asm-powerpc/kexec.h>

Just <asm/kexec.h>

> @@ -3302,6 +3333,71 @@ static int __devexit ehea_remove(struct
> return 0;
> }
>
> +void ehea_crash_deregister(void)
> +{
> + struct ehea_adapter *adapter;
> + int i;
> + u64 hret;
> + u8 reg_type;
> +
> + list_for_each_entry(adapter, &adapter_list, list) {
> + for (i = 0; i < EHEA_MAX_PORTS; i++) {
> + struct ehea_port *port = adapter->port[i];
> + if (port->state == EHEA_PORT_UP) {
> + struct ehea_mc_list *mc_entry = port->mc_list;
> + struct list_head *pos;
> + struct list_head *temp;
> +
> + /* Undo multicast registrations */
> + list_for_each_safe(pos, temp,
> + &(port->mc_list->list)) {
> + mc_entry = list_entry(pos,
> + struct ehea_mc_list,
> + list);
> + ehea_multicast_reg_helper(port,
> + mc_entry->macaddr,
> + H_DEREG_BCMC);
> + }
> +
> + /* Undo broad registration */
> + reg_type = EHEA_BCMC_BROADCAST |
> + EHEA_BCMC_UNTAGGED;
> + ehea_h_reg_dereg_bcmc(port->adapter->handle,
> + port->logical_port_id,
> + reg_type, port->mac_addr,
> + 0, H_DEREG_BCMC);
> +
> + reg_type = EHEA_BCMC_BROADCAST |
> + EHEA_BCMC_VLANID_ALL;
> + ehea_h_reg_dereg_bcmc(port->adapter->handle,
> + port->logical_port_id,
> + reg_type, port->mac_addr,
> + 0, H_DEREG_BCMC);
> + }
> + }
> + for (i = 0; i < EHEA_MAX_RES_HANDLES; i++) {
> + u64 handle = adapter->res_handles[i];
> + if (handle) {
> + hret = ehea_h_free_resource(adapter->handle,
> + handle,
> + FORCE_FREE);
> + }
> + }
> +
> + if (adapter->neq) {
> + hret = ehea_h_free_resource(adapter->handle,
> + adapter->neq->fw_handle,
> + FORCE_FREE);
> + }
> +
> + if (adapter->mr.handle) {
> + hret = ehea_h_free_resource(adapter->handle,
> + adapter->mr.handle,
> + FORCE_FREE);
> + }
> + }
> +}

This is sort of on the right track, I like that the ehea_h_.. routines
are basically just hypercalls, but there's a few things wrong. You're
walking a linked list, which is asking for trouble, if a single pointer
is corrupted you'll walk off into lala land. Then you're pulling fields
out of structs via pointers, again hoping that they're all valid. Then
you walk another linked list .. And then you're pulling bits out of the
adapter again.

Ideally it would look like this:

static u64 things_to_free[];

void ehea_crash_deregister(void)
{
for (i = 0; i < num_things_to_free; i++)
h_call_free_a_thing(things_to_free[i]);
}

> static int ehea_reboot_notifier(struct notifier_block *nb,
> unsigned long action, void *unused)
> {
> @@ -3373,6 +3469,9 @@ int __init ehea_module_init(void)
> goto out;
>
> register_reboot_notifier(&ehea_reboot_nb);
> + ret = crash_shutdown_register(&ehea_crash_deregister);
> + if (ret)
> + ehea_info("failed registering crash handler");

Your naming is a little confusing here, you're registering the
deregister function. I think I get it, you're unregistering the fw
handles, but perhaps ehea_crash_shutdown() would be clearer.

> @@ -3396,10 +3496,15 @@ out:
>
> static void __exit ehea_module_exit(void)
> {
> + int ret;
> +
> flush_scheduled_work();
> driver_remove_file(&ehea_driver.driver, &driver_attr_capabilities);
> ibmebus_unregister_driver(&ehea_driver);
> unregister_reboot_notifier(&ehea_reboot_nb);
> + ret = crash_shutdown_unregister(&ehea_crash_deregister);
> + if (ret)
> + ehea_info("failed unregistering crash handler");

You don't need ret if that's all you're going to do with it.

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

Attachment: signature.asc
Description: This is a digitally signed message part