Re: [PATCH v1] PM-runtime: Check supplier_preactivated before release supplier

From: Nitin Rawat
Date: Wed Oct 12 2022 - 06:31:23 EST


Hi Peter/Rafael,
We are also observed similiar issue on our platform. Looks like there is a race condition(explained below) which cause consumer to resume w/o bumping up the supplier's PM-runtime usage counter.

Process 1 (ufshcd_async_scan context)
ufshcd_async_scan()
scsi_probe_and_add_lun
scsi_add_lun
slave_configure -> enable rpm
scsi_sysfs_add_sdev
scsi_autopm_get_device
device_add <- invoked sd_probe in process 2
scsi_autopm_put_device

Process 2 (sd_probe context)
driver_probe_device
__device_attach_async_helper
__device_attach_driver
driver_probe_device
__driver_probe_device
sd_probe
scsi_autopm_get_device



Race condition for dev->power.runtime_status for consumer dev 0:0:0:0 can happen as below in rpm framework

ufshcd_async_scan context (process 1)
scsi_autopm_put_device() //0:0:0:0
pm_runtime_put_sync()
__pm_runtime_idle()
rpm_idle()
__rpm_callback()
scsi_runtime_idle()
pm_runtime_mark_last_busy()
pm_runtime_autosuspend()
__pm_runtime_suspend(RPM_AUTO)
rpm_suspend(RPM_AUTO)
status = RPM_SUSPENDING
scsi_runtime_suspend()
__rpm_callback()
status = RPM_SUSPENDED------>1
rpm_suspend_suppliers()
return -EBUSY

(use_links)&&(dev->power.runtime_status == RPM_RESUMING && retval)------->3
__rpm_put_suppliers()





sd_probe context (Process 2)
scsi_autopm_get_device() //0:0:0:0
__pm_runtime_resume(RPM_GET_PUT)
rpm_resume
status = RPM_RESUMING----->2



After power.runtime_status of consumer 0:0:0:0 was changed to RPM_SUSPENDED and before scsi_runtime_idle retval was -16(EBUSY) to __rpm_callback, power.runtime_status of consumer 0:0:0:0 was changed to RPM_RESUMING and hence condition 3 became true and __rpm_put_suppliers was called and hence consumer resumed with decremented usage_count due to this race condition.

Please let me know your thoughts on this.

Regards,
Nitin

On 8/2/2022 7:03 PM, Peter Wang wrote:

On 8/2/22 7:01 PM, Rafael J. Wysocki wrote:
On Tue, Aug 2, 2022 at 5:19 AM Peter Wang <peter.wang@xxxxxxxxxxxx> wrote:

Hi Rafael,

Yes, it is very clear!
I miss this important key point that usage_count is always >
rpm_active 1.
I think this patch could work.

Thanks.
Peter




Hi Rafael,

After test with commit ("887371066039011144b4a94af97d9328df6869a2 PM:
runtime: Fix supplier device management during consumer probe") past weeks,
The supplier still suspend when consumer is active "after"
pm_runtime_put_suppliers.
Do you have any idea about that?
Well, this means that the consumer probe doesn't bump up the
supplier's PM-runtime usage counter as appropriate.

You need to tell me more about what happens during the consumer probe.
Which driver is this?

Hi Rafael,

I have the same idea with you. But I still don't know how it could happen.

It is upstream ufs driver in scsi system. Here is call flow
do_scan_async (process 1)
    do_scsi_scan_host
        scsi_scan_host_selected
            scsi_scan_channel
                __scsi_scan_target
                    scsi_probe_and_add_lun
                        scsi_alloc_sdev
                            slave_alloc     -> setup link
                        scsi_add_lun
                            slave_configure    -> enable rpm
                            scsi_sysfs_add_sdev
                                scsi_autopm_get_device    <- get runtime pm
                                device_add                <- invoke sd_probe in process 2
                                scsi_autopm_put_device    <- put runtime pm, point 1

driver_probe_device (process 2)
    __driver_probe_device
        pm_runtime_get_suppliers
            really_probe
                sd_probe
                    scsi_autopm_get_device                <- get runtime pm, point 2
                    pm_runtime_set_autosuspend_delay    <- set rpm delay to 2s
                    scsi_autopm_put_device                <- put runtime pm
        pm_runtime_put_suppliers                        <- (link->rpm_active = 1)

After process 1 call scsi_autopm_put_device(point 1) let consumer enter suspend,
process 2 call scsi_autopm_get_device(point 2) may have chance resume consumer but not
bump up the supplier's PM-runtime usage counter as appropriate.

Thanks.
Peter