On Thu, Dec 14, 2017 at 03:56:46PM +0800, Jason Yan wrote:
On 2017/12/14 15:42, Greg KH wrote:
On Thu, Dec 14, 2017 at 11:39:36AM +0800, Jason Yan wrote:
Some driviers may have the chance to increase a reference count thatThen those drivers are broken :)
has dropped to zero when using get_device() because of their design.
We have met such a issue with scsi:I really don't want to do this, the bus the device is on should prevent
https://www.spinics.net/lists/linux-scsi/msg115295.html
The scsi core will keep the scsi device object in the host list after
it has been deleted and the iterator can still find it. All of the
places where need iterating have to check the state of the scsi device
and this makes a lot of code redundancy and complexity.
Provide a safe mechanism in get_device() by using
kobject_get_unless_zero().
Suggested-by: Bart Van Assche <bart.vanassche@xxxxxxx>
Signed-off-by: Jason Yan <yanaijie@xxxxxxxxxx>
CC: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
CC: Bart Van Assche <bart.vanassche@xxxxxxx>
CC: Ewan D. Milne <emilne@xxxxxxxxxx>
CC: James E.J. Bottomley <jejb@xxxxxxxxxxxxxxxxxx>
CC: Christoph Hellwig <hch@xxxxxx>
---
drivers/base/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 12ebd05..cc74810 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register);
*/
struct device *get_device(struct device *dev)
{
- return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL;
+ return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL;
this from happening.
Also, once that reference count drops to zero, the memory should be
freed, so you really have a stale pointer here, and this code would fail
if you had slab debugging enabled anyway.
Actually I don't want this either. But the design of scsi core will leave
the scsi device on the host list after it is deleted, and it can be
found later and the refcount have a very big chance to increase from
zero again. And after a lot of discussion it seems that the scsi layer
is difficult to change the situation in the near future.
Keeping a 'struct device' reference counted chunk of memory on a list
that has a different lifetime rule from that device itself, is crazy.
And yes, I remember how all of this came about, but I really don't have
the time to work on it myself...
So I don't even think this fixes the issue you think it fixes :)
This issue is very easy to reproduce on my machine and I have tested the
patch and it really fixes the issue.
Even with slab debugging enabled? If so, what is keeping that memory
from being freed once the reference count drops to 0?
I think you are just papering over the real issue here, which is one
reason I really do not like the get_unless_zero() function at all.
thanks,
greg k-h
.