Re: [PATCH] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"

From: Palmer Dabbelt
Date: Thu Oct 27 2022 - 19:07:44 EST


On Sun, 23 Oct 2022 11:54:44 PDT (-0700), Conor Dooley wrote:
From: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>

This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.
If an AXI read to the PCIe controller on PolarFire SoC times out, the
system will stall, with an expected:
io scheduler mq-deadline registered
io scheduler kyber registered
microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
microchip-pcie 2000000000.pcie: MEM 0x2008000000..0x2087ffffff -> 0x0008000000
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: axi read request error
microchip-pcie 2000000000.pcie: axi read timeout
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
Freeing initrd memory: 7336K
mc_event_handler: 667402 callbacks suppressed
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
mc_event_handler: 666588 callbacks suppressed
<truncated>
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
mc_event_handler: 666748 callbacks suppressed
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002 softirq=34/36 fqs=2626
(detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
Task dump for CPU 0:
task:swapper/0 state:R running task stack: 0 pid: 1 ppid: 0 flags:0x00000008
Call Trace:
mc_event_handler: 666648 callbacks suppressed

With this patch applied, the system just locks up without RCU stalling:
io scheduler mq-deadline registered
io scheduler kyber registered
microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
microchip-pcie 2000000000.pcie: MEM 0x2008000000..0x2087ffffff -> 0x0008000000
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: axi read request error
microchip-pcie 2000000000.pcie: axi read timeout
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
Freeing initrd memory: 7332K

Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
---
I don't really want to post a revert, but it's been nearly a month since
I posted about my issue initially & 2 weeks without a reply to Palmer's
comments.
CC: samuel@xxxxxxxxxxxx
CC: aou@xxxxxxxxxxxxxxxxx
CC: atishp@xxxxxxxxxxxxxx
CC: daniel.lezcano@xxxxxxxxxx
CC: dmitriy@xxxxxxxxxxxx
CC: linux-kernel@xxxxxxxxxxxxxxx
CC: linux-riscv@xxxxxxxxxxxxxxxxxxx
CC: palmer@xxxxxxxxxxx
CC: paul.walmsley@xxxxxxxxxx
CC: tglx@xxxxxxxxxxxxx
---
drivers/clocksource/timer-riscv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 969a552da8d2..a0d66fabf073 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -51,7 +51,7 @@ static int riscv_clock_next_event(unsigned long delta,
static unsigned int riscv_clock_event_irq;
static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
.name = "riscv_timer_clockevent",
- .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
+ .features = CLOCK_EVT_FEAT_ONESHOT,
.rating = 100,
.set_next_event = riscv_clock_next_event,
};

There's some discussion on that linked patch and we don't really have a fix yet, but IMO we're better off reverting this as it breaks the common case and it's not clear this is even a sane way to fix the bug.

Reviewed-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
Acked-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>