Re: [PATCH] scsi: sd: unregister device if device_add_disk() failed in sd_probe()

From: Yu Kuai
Date: Fri Dec 22 2023 - 03:27:53 EST


Hi,

在 2023/12/22 14:49, Luis Chamberlain 写道:
On Fri, Dec 08, 2023 at 04:23:35PM +0800, linan666@xxxxxxxxxxxxxxx wrote:
From: Li Nan <linan122@xxxxxxxxxx>

"if device_add() succeeds, you should call device_del() when you want to
get rid of it."

In sd_probe(), device_add_disk() fails when device_add() has already
succeeded, so change put_device() to device_unregister() to ensure device
resources are released.

Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
Signed-off-by: Li Nan <linan122@xxxxxxxxxx>

Nacked-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>

---
drivers/scsi/sd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 542a4bbb21bc..d81cbeee06eb 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3736,7 +3736,7 @@ static int sd_probe(struct device *dev)
error = device_add_disk(dev, gd, NULL);
if (error) {
- put_device(&sdkp->disk_dev);
+ device_unregister(&sdkp->disk_dev);
put_disk(gd);
goto out;
}

This is incorrect, device_unregister() calls:

void device_unregister(struct device *dev)
{
pr_debug("device: '%s': %s\n", dev_name(dev), __func__);
device_del(dev);
put_device(dev);
}

So you're adding what you believe to be a correct missing device_del().
But what you missed is that if device_add_disk() fails then device_add()
did not succeed because the new code we have in the kernel *today* unwinds
this for us now.

I'm confused here, there are two device here, one is 'sdkp->disk_dev',
one is gendisk->part0->bd_device, and the order in which they
initialize:

sd_probe
device_add(&sdkp->disk_dev) -> succeed
device_add_disk -> failed, and device_add(bd_device) did not succeed
put_device(&sdkp->disk_dev) -> device_del is missed

I don't see that if device_add_disk() fail, device_del() for
'sdkp->disk_dev'is called from anywhere. Do I missing anything?

Thanks,
Kuai


What you missed is that in today's code inside device_add_disk(), if
device_add() succeeeds we now unwind and call device_del() for the
device for you. And so, quoting the next sentence you took from
device_add():

"If device_add() has *not* succeeded, use *only* put_device() to drop the
reference count."

Please do reference in the future a crash dump / or explain how you
reached your conclusions if you do not have a crash dump to prove an
issue. Specially if you are suggesting it Fixes a commit.

Luis

.