Re: [PATCH v1] PM-runtime: Check supplier_preactivated before release supplier
From: Nitin Rawat
Date: Wed Oct 12 2022 - 06:31:23 EST
Hi Peter/Rafael,
We are also observed similiar issue on our platform. Looks like there is
a race condition(explained below) which cause consumer to resume w/o
bumping up the supplier's PM-runtime usage counter.
Process 1 (ufshcd_async_scan context)
ufshcd_async_scan()
scsi_probe_and_add_lun
scsi_add_lun
slave_configure -> enable rpm
scsi_sysfs_add_sdev
scsi_autopm_get_device
device_add <- invoked sd_probe in process 2
scsi_autopm_put_device
Process 2 (sd_probe context)
driver_probe_device
__device_attach_async_helper
__device_attach_driver
driver_probe_device
__driver_probe_device
sd_probe
scsi_autopm_get_device
Race condition for dev->power.runtime_status for consumer dev 0:0:0:0
can happen as below in rpm framework
ufshcd_async_scan context (process 1)
scsi_autopm_put_device() //0:0:0:0
pm_runtime_put_sync()
__pm_runtime_idle()
rpm_idle()
__rpm_callback()
scsi_runtime_idle()
pm_runtime_mark_last_busy()
pm_runtime_autosuspend()
__pm_runtime_suspend(RPM_AUTO)
rpm_suspend(RPM_AUTO)
status = RPM_SUSPENDING
scsi_runtime_suspend()
__rpm_callback()
status = RPM_SUSPENDED------>1
rpm_suspend_suppliers()
return -EBUSY
(use_links)&&(dev->power.runtime_status == RPM_RESUMING &&
retval)------->3
__rpm_put_suppliers()
sd_probe context (Process 2)
scsi_autopm_get_device() //0:0:0:0
__pm_runtime_resume(RPM_GET_PUT)
rpm_resume
status = RPM_RESUMING----->2
After power.runtime_status of consumer 0:0:0:0 was changed to
RPM_SUSPENDED and before scsi_runtime_idle retval was -16(EBUSY) to
__rpm_callback, power.runtime_status of consumer 0:0:0:0 was changed to
RPM_RESUMING and hence condition 3 became true and __rpm_put_suppliers
was called and hence consumer resumed with decremented usage_count due
to this race condition.
Please let me know your thoughts on this.
Regards,
Nitin
On 8/2/2022 7:03 PM, Peter Wang wrote:
On 8/2/22 7:01 PM, Rafael J. Wysocki wrote:
On Tue, Aug 2, 2022 at 5:19 AM Peter Wang <peter.wang@xxxxxxxxxxxx>
wrote:
Hi Rafael,
Yes, it is very clear!
I miss this important key point that usage_count is always >
rpm_active 1.
I think this patch could work.
Thanks.
Peter
Hi Rafael,
After test with commit ("887371066039011144b4a94af97d9328df6869a2 PM:
runtime: Fix supplier device management during consumer probe") past
weeks,
The supplier still suspend when consumer is active "after"
pm_runtime_put_suppliers.
Do you have any idea about that?
Well, this means that the consumer probe doesn't bump up the
supplier's PM-runtime usage counter as appropriate.
You need to tell me more about what happens during the consumer probe.
Which driver is this?
Hi Rafael,
I have the same idea with you. But I still don't know how it could happen.
It is upstream ufs driver in scsi system. Here is call flow
do_scan_async (process 1)
do_scsi_scan_host
scsi_scan_host_selected
scsi_scan_channel
__scsi_scan_target
scsi_probe_and_add_lun
scsi_alloc_sdev
slave_alloc -> setup link
scsi_add_lun
slave_configure -> enable rpm
scsi_sysfs_add_sdev
scsi_autopm_get_device <- get
runtime pm
device_add <- invoke
sd_probe in process 2
scsi_autopm_put_device <- put
runtime pm, point 1
driver_probe_device (process 2)
__driver_probe_device
pm_runtime_get_suppliers
really_probe
sd_probe
scsi_autopm_get_device <- get
runtime pm, point 2
pm_runtime_set_autosuspend_delay <- set rpm
delay to 2s
scsi_autopm_put_device <- put
runtime pm
pm_runtime_put_suppliers <-
(link->rpm_active = 1)
After process 1 call scsi_autopm_put_device(point 1) let consumer enter
suspend,
process 2 call scsi_autopm_get_device(point 2) may have chance resume
consumer but not
bump up the supplier's PM-runtime usage counter as appropriate.
Thanks.
Peter