Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target

From: Mike Christie
Date: Thu Mar 02 2023 - 11:39:06 EST


On 3/1/23 3:15 PM, Mike Christie wrote:
> On 2/28/23 9:40 PM, zhongjinghua wrote:
>>> 在 2023/2/13 11:43, Zhong Jinghua 写道:
>>>> From: Zhong Jinghua <zhongjinghua@xxxxxxxxxx>
>>>>
>>>> A use-after-free problem like below:
>>>>
>>>> BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70
>>>>
>>>> Workqueue: scsi_wq_1 __iscsi_unbind_session [scsi_transport_iscsi]
>>>> Call trace:
>>>>   dump_backtrace+0x0/0x320
>>>>   show_stack+0x24/0x30
>>>>   dump_stack+0xdc/0x128
>>>>   print_address_description+0x68/0x278
>>>>   kasan_report+0x1e4/0x308
>>>>   __asan_report_load4_noabort+0x30/0x40
>>>>   scsi_target_reap+0x6c/0x70
>>>>   scsi_remove_target+0x430/0x640
>>>>   __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
>>>>   process_one_work+0x67c/0x1350
>>>>   worker_thread+0x370/0xf90
>>>>   kthread+0x2a4/0x320
>>>>   ret_from_fork+0x10/0x18
>>>>
>>>> The problem is caused by a concurrency scenario:
>>>>
>>>> T0: delete target
>>>> // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
>>>> T1: logout
>>>> // iscsiadm -m node --logout
>>>>
>>>> T0                            T1
>>>>   sdev_store_delete
>>>>    scsi_remove_device
>>>>     device_remove_file
>>>>      __scsi_remove_device
>>>>                              __iscsi_unbind_session
>>>>                               scsi_remove_target
>>>>                           spin_lock_irqsave
>>>>                                list_for_each_entry
>>>>       scsi_target_reap // starget->reaf 1 -> 0
>>>> kref_get(&starget->reap_ref);
>>>>                           // warn use-after-free.
>>>>                           spin_unlock_irqrestore
>>>>        scsi_target_reap_ref_release
>>>>     scsi_target_destroy
>>>>     ... // delete starget
>>>>                           scsi_target_reap
>>>>                           // UAF
>>>>
>>>> When T0 reduces the reference count to 0, but has not been released,
>>>> T1 can still enter list_for_each_entry, and then kref_get reports UAF.
>>>>
>>>> Fix it by using kref_get_unless_zero() to check for a reference count of
>>>> 0.
>>>>
>>>> Signed-off-by: Zhong Jinghua <zhongjinghua@xxxxxxxxxx>
>>>> ---
>>>>   drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
>>>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>>> index e7893835b99a..0ad357ff4c59 100644
>>>> --- a/drivers/scsi/scsi_sysfs.c
>>>> +++ b/drivers/scsi/scsi_sysfs.c
>>>> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
>>>>               starget->state == STARGET_CREATED_REMOVE)
>>>>               continue;
>>>>           if (starget->dev.parent == dev || &starget->dev == dev) {
>>>> -            kref_get(&starget->reap_ref);
>>>> +
>>>> +            /*
>>>> +             * If starget->reap_ref is reduced to 0, it means
>>>> +             * that other processes are releasing it and
>>>> +             * there is no need to delete it again
>>>> +             */
>>>> +            if (!kref_get_unless_zero(&starget->reap_ref)) {
>>>> +                spin_unlock_irqrestore(shost->host_lock, flags);
>>>> +                goto restart;
>>>> +            }
>>>> +
>
> Patch looks ok.
>
> Is there another bug in the existing kref_get_unless_zero(&starget->reap_ref)
> call in scsi_alloc_target?
>
> I think scsi_alloc_target can find the target on the __targets list, and
> it's call to kref_get_unless_zero will succeed if we are only above getting
> our own ref (we have not done __scsi_remove_target and have not done the
> scsi_target_reap call at the end of the function).
>
> But if scsi_remove_target has set the target state to STARGET_REMOVE, the thread
> that did scsi_alloc_target wouldn't be able to put the target into the correct state
> (the scsi_target_add call will see the target state and return). So later if the
> driver/transport class did scsi_remove_target again to remove the target that
> the scsi_alloc_target call re-added, we see the target->state still in STARGET_REMOVE
> and it won't get deleted.
>
> Can we solve both issues at the same time?

I looked into this last part of my comment, and I don't think it's possible.
I thought we could just change around when we add/delete the target from the
__targets list and when the target_alloc/destroy callouts are done, but that
is more difficult than it looks.