RE: [PATCH] ASoC: hdac_hdmi: remove cancel_work_sync in runtime suspend

From: Lu, Brent
Date: Wed Jul 15 2020 - 09:20:13 EST


> A deadlock is identified when there are three contexts running at the same
> time:
> - a HDMI jack work which is calling snd_soc_dapm_sync().
> - user space is calling snd_pcm_release() to close pcm device.
> - pm is calling runtime suspend function of HDMI codec driver.
>
> By removing the clear_dapm_works() invocation in the
> hdac_hdmi_runtime_suspend() function, the snd_pcm_release() could
> always returns from dapm_power_widgets() function call without blocking
> the hdac_hdmi_jack_dapm_work() work thread or being blocked by the
> hdac_hdmi_runtime_suspend() function. The purpose of the jack work is to
> enable/disable the dapm jack pin so it's not necessary to cancel the work in
> runtime suspend function which is usually called when pcm device is closed.
>
> Signed-off-by: Brent Lu <brent.lu@xxxxxxxxx>
> ---

It is found on CML Chromebox 'Puff' which is using Chrome-v4.19 branch. Kernel
resets itself because tasks getting stuck in mutex for 2 mins. Following is the
reproduce steps:

1. Connect to external HDMI display.
2. Open web page "https://bitmovin.com/demos/drm";
3. Reload the web page multiple times until device reboot

snd_pcm_release()
- In dapm_power_widgets() waitng for dapm_pre_sequence_async() to
complete.
- Holding the card->dapm_mutex when calling
snd_soc_dapm_stream_event() function.
- Call trace:
[ 1352.228063] __schedule+0x4b8/0xf5c
[ 1352.228074] schedule+0x3f/0x7a
[ 1352.228083] async_synchronize_cookie_domain+0xb5/0x11c
[ 1352.228091] ? init_wait_entry+0x2e/0x2e
[ 1352.228101] dapm_power_widgets+0x1eb/0x3b5
[ 1352.228111] snd_soc_dapm_stream_event+0x80/0x92
[ 1352.228121] dpcm_dapm_stream_event+0x58/0x7c
[ 1352.228130] dpcm_fe_dai_close+0xa3/0x118
[ 1352.228141] snd_pcm_release_substream+0x72/0xbc
[ 1352.228148] snd_pcm_release+0x3e/0x9b
[ 1352.228156] __fput+0xfa/0x1ef
[ 1352.228166] task_work_run+0x7f/0xa7
[ 1352.228174] exit_to_usermode_loop+0xa5/0xa7
[ 1352.228182] do_syscall_64+0xfb/0x14b
[ 1352.228190] entry_SYSCALL_64_after_hwframe+0x44/0xa9

dapm_pre_sequence_async()
- In rpm_resume() waiting for hdac_hdmi_runtime_suspend() to complete.
- Call trace:
[ 1352.227903] Workqueue: events_unbound async_run_entry_fn
[ 1352.227908] Call Trace:
[ 1352.227917] __schedule+0x4b8/0xf5c
[ 1352.227926] schedule+0x3f/0x7a
[ 1352.227934] rpm_resume+0x16e/0x5ee
[ 1352.227944] ? init_wait_entry+0x2e/0x2e
[ 1352.227951] __pm_runtime_resume+0x6b/0x83
[ 1352.227959] dapm_pre_sequence_async+0x2c/0x8f
[ 1352.227967] async_run_entry_fn+0x3d/0xd1
[ 1352.227975] process_one_work+0x19b/0x4c9
[ 1352.227983] worker_thread+0x10d/0x284
[ 1352.227991] kthread+0x138/0x140
[ 1352.227998] ? process_one_work+0x4c9/0x4c9
[ 1352.228006] ? kthread_associate_blkcg+0xbe/0xbe
[ 1352.228012] ret_from_fork+0x1f/0x40

hdac_hdmi_runtime_suspend()
- In clear_dapm_works() waiting for hdac_hdmi_jack_dapm_work() to
complete.
- Call trace:
[ 1352.228359] Workqueue: pm pm_runtime_work
[ 1352.228379] Call Trace:
[ 1352.228396] __schedule+0x4b8/0xf5c
[ 1352.228410] ? delete_node+0x74/0x1ab
[ 1352.228426] schedule+0x3f/0x7a
[ 1352.228442] schedule_timeout+0x11f/0x176
[ 1352.228461] do_wait_for_common+0x118/0x164
[ 1352.228477] ? console_conditional_schedule+0x28/0x28
[ 1352.228494] ? queue_core_balance+0x75/0x75
[ 1352.228509] wait_for_completion+0x4c/0x58
[ 1352.228526] __flush_work+0x112/0x1a1
[ 1352.228540] ? insert_wq_barrier+0x118/0x118
[ 1352.228556] __cancel_work_timer+0x9f/0x1ad
[ 1352.228569] ? __switch_to_asm+0x41/0x70
[ 1352.228582] ? __switch_to_asm+0x35/0x70
[ 1352.228596] ? __switch_to_asm+0x41/0x70
[ 1352.228613] clear_dapm_works+0x3f/0x60 [snd_soc_hdac_hdmi]
[ 1352.228627] ? hdmi_codec_resume+0x40/0x40 [snd_soc_hdac_hdmi]
[ 1352.228642] hdac_hdmi_runtime_suspend+0x20/0xac [snd_soc_hdac_hdmi]
[ 1352.228654] __rpm_callback+0x8a/0xf3
[ 1352.228669] ? hdmi_codec_resume+0x40/0x40 [snd_soc_hdac_hdmi]
[ 1352.228680] rpm_suspend+0x3b4/0x73e
[ 1352.228696] pm_runtime_work+0x5d/0xa0
[ 1352.228708] process_one_work+0x19b/0x4c9
[ 1352.228717] worker_thread+0x10d/0x284
[ 1352.228727] kthread+0x138/0x140
[ 1352.228736] ? process_one_work+0x4c9/0x4c9
[ 1352.228746] ? kthread_associate_blkcg+0xbe/0xbe
[ 1352.228761] ret_from_fork+0x1f/0x40

hdac_hdmi_jack_dapm_work()
- In snd_soc_dapm_sync() waiting for waiting for card->dapm_mutex to
be released.
- The mutex is held by snd_pcm_release(), deadlock here.
- Call trace:
[ 1352.228835] Workqueue: events hdac_hdmi_jack_dapm_work [snd_soc_hdac_hdmi]
[ 1352.228845] Call Trace:
[ 1352.228860] __schedule+0x4b8/0xf5c
[ 1352.228878] ? mutex_spin_on_owner+0x86/0x94
[ 1352.228896] schedule_preempt_disabled+0x47/0x8a
[ 1352.228911] __mutex_lock_common+0x25c/0x52c
[ 1352.228929] __mutex_lock_slowpath+0x20/0x26
[ 1352.228946] snd_soc_dapm_sync+0x1b/0x4d
[ 1352.228960] process_one_work+0x19b/0x4c9
[ 1352.228976] worker_thread+0x1e4/0x284
[ 1352.228994] kthread+0x138/0x140
[ 1352.229007] ? process_one_work+0x4c9/0x4c9
[ 1352.229021] ? kthread_associate_blkcg+0xbe/0xbe
[ 1352.229036] ret_from_fork+0x1f/0x40


Regards,
Brent

> sound/soc/codecs/hdac_hdmi.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/sound/soc/codecs/hdac_hdmi.c
> b/sound/soc/codecs/hdac_hdmi.c index f26b77f..9052edd 100644
> --- a/sound/soc/codecs/hdac_hdmi.c
> +++ b/sound/soc/codecs/hdac_hdmi.c
> @@ -2097,8 +2097,6 @@ static int hdac_hdmi_runtime_suspend(struct
> device *dev)
> if (!bus)
> return 0;
>
> - clear_dapm_works(hdev);
> -
> /*
> * Power down afg.
> * codec_read is preferred over codec_write to set the power state.
> --
> 2.7.4