Re: Locking issues in "mmc: rtsx: add support for pre_req and post_req"

From: micky
Date: Mon Apr 21 2014 - 04:41:47 EST


Hi Peter,

I'm considering not using spinlock to get it work and will resend the patch later.
Thank you.

Best Regards.
micky
On 04/19/2014 07:13 AM, Peter Wu wrote:
So, I reverted c42deffd5b53c9e583d83c7964854ede2f12410d ("mmc: rtsx: add
support for pre_req and post_req") on top of v3.15-rc1-49-g10ec34f and the
hang issue went away.

There is something that is possibly problematic. All three tasklets (cmd,
data, finish) try to spinlock on host->lock. According to the tasklets
documentation[1], they always run at interrupts (i.e., at any time, possibly
right after tasklet_schedule if I got this right?).

These tasklets however do get scheduled *under* the host->lock which will
cause a deadlock. This proposed patch ("mmc: rtsx: fix possible circular
locking dependency") fixes the issue for sd_isr_done_transfer, but there are
others:

1. sd_request_timeout:
125 spin_lock_irqsave(&host->lock, flags);
...
139 out:
140 tasklet_schedule(&host->finish_tasklet); // <--
141 spin_unlock_irqrestore(&host->lock, flags);

2. sd_get_rsp (cmd_tasklet!):
429 spin_lock_irqsave(&host->lock, flags);
...
506 tasklet_schedule(&host->finish_tasklet); // <--
507 spin_unlock_irqrestore(&host->lock, flags);

3. sd_finish_multi_rw (data_tasklet!):
657 spin_lock_irqsave(&host->lock, flags);
...
684 tasklet_schedule(&host->finish_tasklet); // <--
685 spin_unlock_irqrestore(&host->lock, flags);

4. sdmmc_request:
921 mutex_lock(&pcr->pcr_mutex);
922 spin_lock_irqsave(&host->lock, flags);
...
967 tasklet_schedule(&host->finish_tasklet); // <--
968 spin_unlock_irqrestore(&host->lock, flags);

5. rtsx_pci_sdmmc_drv_remove:
1526 spin_lock_irqsave(&host->lock, flags);
1527 if (host->mrq) {
...
1541 tasklet_schedule(&host->finish_tasklet); // <--
1542 }
1543 spin_unlock_irqrestore(&host->lock, flags);
1544
1545 del_timer_sync(&host->timer);
1546 tasklet_kill(&host->cmd_tasklet);
1547 tasklet_kill(&host->data_tasklet);
1548 tasklet_kill(&host->finish_tasklet); // <--

pcr_mutex (un)locking:
- gets locked in sdmmc_request
- gets unlocked in sd_finish_request (finish_tasklet).
- gets locked/unlocked in sdmmc_set_ios
- gets locked/unlocked in sdmmc_get_ro
- gets locked/unlocked in sdmmc_get_cd
- gets locked/unlocked in sdmmc_switch_voltage
- gets locked/unlocked in sdmmc_execute_tuning

finish_tasklet (sd_finish_request()) gets scheduled in:
- sd_request_timeout (under host->lock; called on timer expiration)
- error path of sd_send_cmd
- get_rsp (cmd_tasklet; under host->lock)
- error path of sd_start_multi_rw (after mod timer)
- sd_finish_multi_rw (under host->lock)
- sdmmc_request (under host->lock in error path; without host->lock
elsewhere)
- rtsx_pci_sdmmc_drv_remove (under host->lock)

sd_request_timeout (timer) related:
- deleted in sd_finish_request under host->lock (also assumes pcr_mutex)
- sd_send_cmd (timer set with 100ms timeout, not called in error path)
- sd_start_multi_rw (timeout set to 10 HZ, called before error path is
checked in which finish_tasklet gets scheduled)
Note: sd_request_timeout claims+releases host->lock.

If I understand it correctly, host->lock is responsible for protecting the
realtek_pci_sdmmc structure. Why is tasklet_schedule() on the same lock?

Shouldn't the above five tasklet_schedule calls be moved outside the lock?
Will it be problematic if the same tasklet gets executed multiple times (if
that is possible?).

Does it really need that much locking? dw_mmc.c also implements pre_req, but
uses tasklets without needing to lock anything.

Kind regards,
Peter

[1]: http://www.makelinux.net/ldd3/chp-7-sect-5

On Friday 18 April 2014 16:00:53 Peter Wu wrote:
On Wednesday 16 April 2014 09:38:44 micky_ching@xxxxxxxxxxxxxx wrote:
From: Micky Ching <micky_ching@xxxxxxxxxxxxxx>

To avoid dead lock, we need make sure host->lock is always acquire
before pcr->lock. But in irq handler, we acquired pcr->lock in rtsx mfd
driver, and sd_isr_done_transfer() is called during pcr->lock already
acquired. Since in sd_isr_done_transfer() the only work we do is schdule
tasklet, the cmd_tasklet and data_tasklet never conflict, so it is safe
to remove spin_lock() here.

Signed-off-by: Micky Ching <micky_ching@xxxxxxxxxxxxxx>
---

drivers/mmc/host/rtsx_pci_sdmmc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
This patch came from
https://lkml.kernel.org/r/534DE1D7.3000308@xxxxxxxxxxxxxx ("Re:
rtsx_pci_sdmmc lockdep splat").

With v3.15-rc1-49-g10ec34f, I have a hung machine when inserting a SD card.
lockdep was not enabled for the kernel, I have not bisected yet.
This patch on top of that kernel version does not help (tested by
rmmod rtsx_pci_sdmmc and insmod the patched one).

Console (as typed over from a picture, sorry for any typos):
WARNING: CPU: 1 PID: 0 at kernel/locking/mutex.c:698
DEBUG_LOCKS_WARN_ON(in_interrupt())
Modules linked in: ...
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15-rc1-custom-000049-g10ec34f #5
Hardware name: Shuttle Inc. XS36V/XS36V, BIOS 1.11 12/18/2012
Call trace:
<IRQ> dump_stack
warn_slowpath_common
warn_slowpath_fmt
__mutex_unlock_slowpath
mutex_unlock
sd_finish_request [rtsx_pci_sdmmc]
tasklet_action
__do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
<EOI> ? cpuidle_enter_state
cpuidle_enter
cpu_startup_entry
start_secondary
---[end trace ...]---
(60 seconds later:)
INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=18004
jiffies, g=3264, c=3263, q=2) INFO: Stall ended before state dump start

I also managed to get this trace about 106 seconds later when switching TTY:
INFO: task kworker/... blocked for more than 120 seconds
Workqueue: kmmcd mm_rescan [mmc_core]
Call trace:
? update_rq_clock.part80
? internal_add_timer
schedule
schedule_preempt
__mutex_lock_slowpath
mutex_lock
sdmmc_request [rtsx_pci_sdmmc]
mmc_start_request [mmc_core]
__mmc_start_req [mmc_core]
mmc_wait_for_cmd [mmc_core]
? mmc_release_host [mmc_core]
mmc_io_rw_direct_host [mmc_core]

I'll try to get a lockdep kernel and text logs later, but perhaps you
already know the issue?

Peter

diff --git a/drivers/mmc/host/rtsx_pci_sdmmc.c
b/drivers/mmc/host/rtsx_pci_sdmmc.c index 453e1d4..40695e0 100644
--- a/drivers/mmc/host/rtsx_pci_sdmmc.c
+++ b/drivers/mmc/host/rtsx_pci_sdmmc.c
@@ -108,12 +108,10 @@ static void sd_isr_done_transfer(struct
platform_device *pdev) {

struct realtek_pci_sdmmc *host = platform_get_drvdata(pdev);

- spin_lock(&host->lock);

if (host->cmd)

tasklet_schedule(&host->cmd_tasklet);

- if (host->data)
+ else if (host->data)

tasklet_schedule(&host->data_tasklet);

- spin_unlock(&host->lock);

}
static void sd_request_timeout(unsigned long host_addr)
.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/