Hi Bao D.,
We have done some test based on your RFC v3 patches and an issue is
reported.
kworker/u16:4: BUG: scheduling while atomic:
kworker/u16:4/5736/0x00000002
kworker/u16:4: [name:core&]Preemption disabled at:
kworker/u16:4: [<ffffffef97e33024>] ufshcd_mcq_sq_cleanup+0x9c/0x27c
kworker/u16:4: CPU: 2 PID: 5736 Comm: kworker/u16:4 Tainted: G
S W OE
kworker/u16:4: Workqueue: ufs_eh_wq_0 ufshcd_err_handler
kworker/u16:4: Call trace:
kworker/u16:4: dump_backtrace+0x108/0x15c
kworker/u16:4: show_stack+0x20/0x30
kworker/u16:4: dump_stack_lvl+0x6c/0x8c
kworker/u16:4: dump_stack+0x20/0x44
kworker/u16:4: __schedule_bug+0xd4/0x100
kworker/u16:4: __schedule+0x660/0xa5c
kworker/u16:4: schedule+0x80/0xec
kworker/u16:4: schedule_hrtimeout_range_clock+0xa0/0x140
kworker/u16:4: schedule_hrtimeout_range+0x1c/0x30
kworker/u16:4: usleep_range_state+0x88/0xd8
kworker/u16:4: ufshcd_mcq_sq_cleanup+0x170/0x27c
kworker/u16:4: ufshcd_clear_cmds+0x78/0x184
kworker/u16:4: ufshcd_wait_for_dev_cmd+0x234/0x348
kworker/u16:4: ufshcd_exec_dev_cmd+0x220/0x298
kworker/u16:4: ufshcd_verify_dev_init+0x68/0x124
kworker/u16:4: ufshcd_probe_hba+0x390/0x9bc
kworker/u16:4: ufshcd_host_reset_and_restore+0x74/0x158
kworker/u16:4: ufshcd_reset_and_restore+0x70/0x31c
kworker/u16:4: ufshcd_err_handler+0xad4/0xe58
kworker/u16:4: process_one_work+0x214/0x5b8
kworker/u16:4: worker_thread+0x2d4/0x448
kworker/u16:4: kthread+0x110/0x1e0
kworker/u16:4: ret_from_fork+0x10/0x20
kworker/u16:4: ------------[ cut here ]------------
On Wed, 2023-03-29 at 03:01 -0700, Bao D. Nguyen wrote:
+/**As spin_lock() disable preemption
+ * ufshcd_mcq_sq_cleanup - Clean up Submission Queue resources
+ * associated with the pending command.
+ * @hba - per adapter instance.
+ * @task_tag - The command's task tag.
+ * @result - Result of the Clean up operation.
+ *
+ * Returns 0 and result on completion. Returns error code if
+ * the operation fails.
+ */
+int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag, int
*result)
+{
+ struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
+ struct scsi_cmnd *cmd = lrbp->cmd;
+ struct ufs_hw_queue *hwq;
+ void __iomem *reg, *opr_sqd_base;
+ u32 nexus, i, val;
+ int err;
+
+ if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
+ if (!cmd)
+ return FAILED;
+ hwq = ufshcd_mcq_req_to_hwq(hba,
scsi_cmd_to_rq(cmd));
+ } else {
+ hwq = hba->dev_cmd_queue;
+ }
+
+ i = hwq->id;
+
+ spin_lock(&hwq->sq_lock);
+read_poll_timeout() was ufshcd_mcq_poll_register() in last patch,
+ /* stop the SQ fetching before working on it */
+ err = ufshcd_mcq_sq_stop(hba, hwq);
+ if (err)
+ goto unlock;
+
+ /* SQCTI = EXT_IID, IID, LUN, Task Tag */
+ nexus = lrbp->lun << 8 | task_tag;
+ opr_sqd_base = mcq_opr_base(hba, OPR_SQD, i);
+ writel(nexus, opr_sqd_base + REG_SQCTI);
+
+ /* SQRTCy.ICU = 1 */
+ writel(SQ_ICU, opr_sqd_base + REG_SQRTC);
+
+ /* Poll SQRTSy.CUS = 1. Return result from SQRTSy.RTC */
+ reg = opr_sqd_base + REG_SQRTS;
+ err = read_poll_timeout(readl, val, val & SQ_CUS, 20,
+ MCQ_POLL_US, false, reg);
right? ufshcd_mcq_poll_register() calls usleep_range() causing KE as
reported above. Same issue seems to still exist as read_poll_timeout()
sleeps.
Skipping ufshcd_mcq_sq_cleanup() by returning FAILED directly to
trigger reset in ufshcd error handler successfully recover host.
BTW, is there maybe a change list between RFC v3 and this v1 patch? :)
Thanks
Po-Wen