Re: [PATCH] vhost/scsi: Fix improper cleanup in vhost_scsi_set_endpoint()

From: Mike Christie
Date: Tue Jan 14 2025 - 12:44:51 EST


On 1/14/25 1:40 AM, 张浩然 wrote:
> After reevaluating the PoC, I realized that my initial claim was incorrect. The target WWN in the second vhost_scsi_set_endpoint() call is not the same as in the first one. Below is my targetcli status:
>
> o- vhost ......................................... [Targets: 3]
> | o- naa.500140501e23be28 ......................... [TPGs: 1]
> | | o- tpg1 ............. [naa.50014058f7da10b7, no-gen-acls]
> | | o- acls ..................................... [ACLs: 0]
> | | o- luns ..................................... [LUNs: 0]
> | o- naa.500140562c8936fa ......................... [TPGs: 2]
> | | o- tpg1 ............. [naa.50014058d133f962, no-gen-acls]
> | | | o- acls ..................................... [ACLs: 0]
> | | | o- luns ..................................... [LUNs: 3]
> | | | o- lun0 ... [block/disk0 (/dev/disk/...) (default_tg_pt_gp)]
> | | | o- lun1 .... [fileio/vhost-fileio (/root/fileio-vhost) (default_tg_pt_gp)]
> | | | o- lun2 ............. [ramdisk/rd (default_tg_pt_gp)]
> | | o- tpg2 ............. [naa.50014055c6fb4182, no-gen-acls]
> | | o- acls ..................................... [ACLs: 0]
> | | o- luns ..................................... [LUNs: 0]
>
> The bug occurs when `naa.500140562c8936fa` has already been set as an endpoint, and I send a VHOST_SCSI_SET_ENDPOINT ioctl command with `naa.500140501e23be28`. The ioctl returns -1 EEXIST (File exists), and the kernel logs a BUG message in dmesg.

I see now and can replicate it. I think there is a 2nd bug in
vhost_scsi_set_endpoint related to all this where we need to
prevent switching targets like this or else we'll leak some
other refcounts. If 500140501e23be28's tpg number was 3 then
we would overwrite the existing vs->vs_vhost_wwpn and never
be able to release the refounts on the tpgs from 500140562c8936fa.

I'll send a patchset to fix everything and cc you.

Thanks for all the work you did testing and debugging this
issue.