Re: Fwd: Marvell 88SE6320 SAS controller (mvsas) cannot survive ACPI S3 or ACPI S4

From: Bagas Sanjaya
Date: Thu Oct 26 2023 - 10:12:38 EST


On Thu, Oct 26, 2023 at 05:56:03PM +0900, Damien Le Moal wrote:
> On 2023/10/26 17:25, Bagas Sanjaya wrote:
> > Hi,
> >
> > I notice a bug report on Bugzilla [1]. Quoting from it:
>
> [...]
>
> >> [ 437.249448] PM: suspend entry (deep)
> >> [ 437.255308] Filesystems sync: 0.005 seconds
> >> [ 437.255570] Freezing user space processes
> >> [ 437.257093] Freezing user space processes completed (elapsed 0.001 seconds)
> >> [ 437.257097] OOM killer disabled.
> >> [ 437.257098] Freezing remaining freezable tasks
> >> [ 437.258226] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> >> [ 437.258281] printk: Suspending console(s) (use no_console_suspend to debug)
> >> [ 437.291778] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
> >> [ 437.291825] sd 0:0:1:0: [sdc] Synchronizing SCSI cache
> >> [ 437.292083] sd 0:0:0:0: [sdb] Stopping disk
> >> [ 437.292083] sd 0:0:1:0: [sdc] Stopping disk
> >> [ 438.363660] sd 1:0:0:0: [sda] Synchronizing SCSI cache
> >> [ 438.363760] sd 1:0:0:0: [sda] Stopping disk
>
> Given this message, this does not look like the latest kernel.
>
> >> [ 589.081341] drivers/scsi/mvsas/mv_sas.c 1304:mvs_I_T_nexus_reset for device[1]:rc= 0
> >> [ 610.481270] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >> [ 610.481280] rcu: 11-...0: (0 ticks this GP) idle=4f84/1/0x4000000000000000 softirq=19873/19873 fqs=1159
> >> [ 610.481292] (detected by 5, t=5252 jiffies, g=53581, q=31630 ncpus=12)
> >> [ 610.481299] Sending NMI from CPU 5 to CPUs 11:
> >> [ 610.481309] NMI backtrace for cpu 11
> >> [ 610.481312] CPU: 11 PID: 3152 Comm: kworker/u32:59 Tainted: G I 6.1.57-vanilla #14
> >> [ 610.481318] Hardware name: System manufacturer System Product Name/P6T WS PRO, BIOS 1205 09/24/2010
> >> [ 610.481321] Workqueue: events_unbound async_run_entry_fn
> >> [ 610.481329] RIP: 0010:mvs_int_rx+0x81/0x150 [mvsas]
> >> [ 610.481346] Code: 00 00 44 39 75 70 74 47 48 8b 45 60 45 89 e6 41 81 e6 ff 03 00 00 41 8d 56 01 8b 1c 90 49 89 d4 41 89 df 41 81 e7 00 00 08 00 <f7> c3 00 00 01 00 74 58 31 d2 89 de 48 89 ef e8 0b f9 ff ff 45 85
> >> [ 610.481350] RSP: 0018:ffffb61f06acbb60 EFLAGS: 00000046
> >> [ 610.481354] RAX: ffff9a7cc2658000 RBX: 0000000000010000 RCX: 0000000000000000
> >> [ 610.481358] RDX: 000000000000026e RSI: 0000000000010000 RDI: ffff9a7ce2660000
> >> [ 610.481361] RBP: ffff9a7ce2660000 R08: ffff9a7ce2660f00 R09: ffff9a7ce2660000
> >> [ 610.481364] R10: ffff9a7ce26600c8 R11: ffffffff884d4300 R12: 000000000000026e
> >> [ 610.481367] R13: 0000000000000000 R14: 000000000000026d R15: 0000000000000000
> >> [ 610.481371] FS: 0000000000000000(0000) GS:ffff9a7df7cc0000(0000) knlGS:0000000000000000
> >> [ 610.481375] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 610.481378] CR2: 0000563633425300 CR3: 0000000077210006 CR4: 00000000000206e0
> >> [ 610.481382] Call Trace:
> >> [ 610.481385] <NMI>
> >> [ 610.481389] ? nmi_cpu_backtrace.cold+0x1b/0x76
> >> [ 610.481398] ? nmi_cpu_backtrace_handler+0xd/0x20
> >> [ 610.481403] ? nmi_handle+0x5d/0x120
> >> [ 610.481410] ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [ 610.481423] ? default_do_nmi+0x69/0x170
> >> [ 610.481428] ? exc_nmi+0x13c/0x170
> >> [ 610.481432] ? end_repeat_nmi+0x16/0x67
> >> [ 610.481443] ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [ 610.481457] ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [ 610.481470] ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [ 610.481483] </NMI>
> >> [ 610.481484] <TASK>
> >> [ 610.481487] mvs_do_release_task+0x3f/0x90 [mvsas]
> >> [ 610.481501] mvs_release_task+0x13e/0x1a0 [mvsas]
> >> [ 610.481516] mvs_I_T_nexus_reset+0xb2/0xd0 [mvsas]
> >> [ 610.481530] ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [ 610.481552] sas_ata_hard_reset+0x48/0x80 [libsas]
> >> [ 610.481575] ata_eh_reset+0x2e5/0x1090 [libata]
> >> [ 610.481631] ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [ 610.481652] ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [ 610.481676] ata_eh_recover+0x2e6/0xe00 [libata]
> >> [ 610.481728] ? __wake_up_klogd.part.0+0x56/0x80
> >> [ 610.481735] ? vprintk_emit+0x207/0x290
> >> [ 610.481739] ? smp_ata_check_ready_type+0xb0/0xb0 [libsas]
> >> [ 610.481760] ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [ 610.481783] ? smp_ata_check_ready_type+0xb0/0xb0 [libsas]
> >> [ 610.481804] ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [ 610.481824] ata_do_eh+0x75/0xf0 [libata]
> >> [ 610.481876] ? del_timer_sync+0x6f/0xb0
> >> [ 610.481884] ata_scsi_port_error_handler+0x3a8/0x800 [libata]
> >> [ 610.481938] async_sas_ata_eh+0x44/0x7f [libsas]
> >> [ 610.481960] async_run_entry_fn+0x30/0x130
> >> [ 610.481966] process_one_work+0x1c7/0x380
> >> [ 610.481974] worker_thread+0x4d/0x380
> >> [ 610.481981] ? rescuer_thread+0x3a0/0x3a0
> >> [ 610.481987] kthread+0xe9/0x110
> >> [ 610.481992] ? kthread_complete_and_exit+0x20/0x20
> >> [ 610.481999] ret_from_fork+0x22/0x30
> >> [ 610.482009] </TASK>
> >> [ 665.286198] NMI watchdog: Watchdog detected hard LOCKUP on cpu 11
> Could be due to the libata deadlock without the recent suspend/resume fixes. Or
> this is yet another adapter that was not tested for suspend/resume. mpt3sas
> crashes the machine 100% of the time as well. I had no time to dig into that issue.
>

The reporter on Bugzilla [1] said:

> Hello again,
> 6.6rc7 was unable to resume disks from s3 as expected.
> Basically mvsas does not resume the attached devices at all.
> The suspend/resume logic was never implemented and nothing happens on resume.

It looks like mvsas driver doesn't have S3/S4 logic at all, right?

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218030#add_comment

--
An old man doll... just what I always wanted! - Clara

Attachment: signature.asc
Description: PGP signature