Re: [PATCH 1/2] ipmi_ssif: Unregister i2c device only if created by ssif

From: George Cherian
Date: Tue Aug 28 2018 - 10:33:06 EST



Hi Corey,

On 08/28/2018 04:59 AM, Corey Minyard wrote:

On 08/27/2018 01:07 AM, George Cherian wrote:

Hi Corey,

On 08/24/2018 06:37 PM, Corey Minyard wrote:

On 08/24/2018 06:10 AM, George Cherian wrote:
In ssif_probe error path the i2c client is left hanging, so that
ssif_platform_remove will remove the client. But it is quite
possible that ssif would never call an i2c_new_device.
This condition would lead to kernel crash as below.
To fix this leave only the client ssif registered hanging in error
path. All other non-registered clients are set to NULL.

I'm having a hard time seeing how this could happen.

The i2c_new_device() call is only done in the case of dmi_ipmi_probe
(called from
ssif_platform_probe) or a hard-coded entry. How does
ssif_platform_remove get
called on a device that was not registered with ssif_platform_probe?


Initially I also had the same doubt but then,
ssif_adapter_hanlder is called for each i2c_dev only after initialized
is true. So we end up not calling i2c_new_device for devices probed
during the module_init time.


I spent some time looking at this, and I don't think that's what is
happening.

I think that i2c_del_driver() in cleanup_ipmi_ssif() is causing
i2c_unregister_device() to be called on all the devices, and
platform_driver_unregister() causes it to be called on the
devices that came in through the platform method. It's
a double-free.

Try reversing the order of i2c_del_driver() and platform_driver_unregister()
in cleanup_ipmi_ssif() to test this.

Reversing the call order didn't help, I am still getting the trace.

You are partly correct on the double free scenario. I dont see double free in normal operation. I see a double free only in probe failure case.


I have added prints in i2c_unregister_device() to print the client.
pr_err("client = %px\n", client);

In normal case, I get 2 calls to i2c_unregister_device()
Call 1: i2c-core: client = ffff800ada315400 => called from i2c_del_driver()
This in turn calls ssif_remove, where we set addr_info->client to NULL.

Call 2: i2c-core: client = 0000000000000000 => called from ssif_platform_remove()
This is fine since i2c_unregister_device is NULL aware.
This works fine without crashing .

Now in the probe failing case, I get 2 calls to i2c_unregister_device()
Call 1: i2c-core: client = ffff800ad9897400 => called from i2c_del_driver()
This never calls ssif_remove, since the probe failed.

Call 2: i2c-core: client = ffff800ad9897400 => called from ssif_platform_remove()
This is a case of double free.

Do you think the proposed patch itself is the solution or
Is it that we should really leave addr_info->client hanging in probe
error path at all?

NB: For easy simulation of the ssif_probe failing case I just replaced
the

ssif_info->thread = kthread_run(....) with

ssif_info->thread = ERR_PTR(-4); so that the probe takes the goto out path.

-George
-corey

ssif_platform_remove() get called during removal of ipmi_ssif.
In case during ssif_probe() if there is a failure before
ipmi_smi_register then we leave the addr_info->client hanging.

In case of normal functioning without error, we set addr_info->client
to NULL after ipmi_unregiter_smi in ssif_remove.

Small style comment inline...
I will make the changess and sent out a v2!!

Thanks,
-George

 CPU: 107 PID: 30266 Comm: rmmod Not tainted 4.18.0+ #80
 Hardware name: Cavium Inc. Saber/Saber, BIOS Cavium reference
firmware version 7.0 08/04/2018
 pstate: 60400009 (nZCv daif +PAN -UAO)
 pc : kernfs_find_ns+0x28/0x120
 lr : kernfs_find_and_get_ns+0x40/0x60
 sp : ffff00002310fb50
 x29: ffff00002310fb50 x28: ffff800a8240f800
 x27: 0000000000000000 x26: 0000000000000000
 x25: 0000000056000000 x24: ffff000009073000
 x23: ffff000008998b38 x22: 0000000000000000
 x21: ffff800ed86de820 x20: 0000000000000000
 x19: ffff00000913a1d8 x18: 0000000000000000
 x17: 0000000000000000 x16: 0000000000000000
 x15: 0000000000000000 x14: 5300737265766972
 x13: 643d4d4554535953 x12: 0000000000000030
 x11: 0000000000000030 x10: 0101010101010101
 x9 : ffff800ea06cc3f9 x8 : 0000000000000000
 x7 : 0000000000000141 x6 : ffff000009073000
 x5 : ffff800adb706b00 x4 : 0000000000000000
 x3 : 00000000ffffffff x2 : 0000000000000000
 x1 : ffff000008998b38 x0 : ffff000008356760
 Process rmmod (pid: 30266, stack limit = 0x00000000e218418d)
 Call trace:
ÂÂ kernfs_find_ns+0x28/0x120
ÂÂ kernfs_find_and_get_ns+0x40/0x60
ÂÂ sysfs_unmerge_group+0x2c/0x6c
ÂÂ dpm_sysfs_remove+0x34/0x70
ÂÂ device_del+0x58/0x30c
ÂÂ device_unregister+0x30/0x7c
ÂÂ i2c_unregister_device+0x84/0x90 [i2c_core]
ÂÂ ssif_platform_remove+0x38/0x98 [ipmi_ssif]
ÂÂ platform_drv_remove+0x2c/0x6c
ÂÂ device_release_driver_internal+0x168/0x1f8
ÂÂ driver_detach+0x50/0xbc
ÂÂ bus_remove_driver+0x74/0xe8
ÂÂ driver_unregister+0x34/0x5c
ÂÂ platform_driver_unregister+0x20/0x2c
ÂÂ cleanup_ipmi_ssif+0x50/0xd82c [ipmi_ssif]
ÂÂ __arm64_sys_delete_module+0x1b4/0x220
ÂÂ el0_svc_handler+0x104/0x160
ÂÂ el0_svc+0x8/0xc
 Code: aa1e03e0 aa0203f6 aa0103f7 d503201f (7940e280)
 ---[ end trace 09f0e34cce8e2d8c ]---
 Kernel panic - not syncing: Fatal exception
 SMP: stopping secondary CPUs
 Kernel Offset: disabled
 CPU features: 0x23800c38

Signed-off-by: George Cherian <george.cherian@xxxxxxxxxx>
---
 drivers/char/ipmi/ipmi_ssif.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/char/ipmi/ipmi_ssif.c
b/drivers/char/ipmi/ipmi_ssif.c
index 18e4650..ccdf6b1 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -181,6 +181,7 @@ struct ssif_addr_info {
ÂÂÂÂÂ struct device *dev;
ÂÂÂÂÂ struct i2c_client *client;

+ÂÂÂÂ bool client_registered;
ÂÂÂÂÂ struct mutex clients_mutex;
ÂÂÂÂÂ struct list_head clients;

@@ -1658,6 +1659,8 @@ static int ssif_probe(struct i2c_client
*client, const struct i2c_device_id *id)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * the client like it should.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
ÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_err(&client->dev, "Unable to start IPMI SSIF:
%d\n", rv);
+ÂÂÂÂÂÂÂÂÂÂÂÂ if (!addr_info->client_registered)
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ addr_info->client = NULL;
ÂÂÂÂÂÂÂÂÂÂÂÂÂ kfree(ssif_info);
ÂÂÂÂÂ }
ÂÂÂÂÂ kfree(resp);
@@ -1672,11 +1675,14 @@ static int ssif_probe(struct i2c_client
*client, const struct i2c_device_id *id)
 static int ssif_adapter_handler(struct device *adev, void *opaque)
 {
ÂÂÂÂÂ struct ssif_addr_info *addr_info = opaque;
+ÂÂÂÂ struct i2c_client *client;

ÂÂÂÂÂ if (adev->type != &i2c_adapter_type)
ÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;

-ÂÂÂÂ i2c_new_device(to_i2c_adapter(adev), &addr_info->binfo);
+ÂÂÂÂ client = i2c_new_device(to_i2c_adapter(adev), &addr_info->binfo);
+ÂÂÂÂ if (client)
+ÂÂÂÂÂÂÂÂÂÂÂÂ addr_info->client_registered = true;


How about..
ÂÂÂ if (i2c_new_device(to_i2c_adapter(adev), &addr_info->binfo))
ÂÂÂÂÂÂÂ addr_info->client_registered = true;

No need for the client variable.

-corey

ÂÂÂÂÂ if (!addr_info->adapter_name)
ÂÂÂÂÂÂÂÂÂÂÂÂÂ return 1; /* Only try the first I2C adapter by
default. */