Re: [PATCH] riscv: Do not save the scratch CSR during suspend

From: Palmer Dabbelt
Date: Tue Apr 09 2024 - 15:44:08 EST


On Thu, 21 Mar 2024 16:51:31 PDT (-0700), samuel.holland@xxxxxxxxxx wrote:
On 2024-03-14 11:55 PM, JeeHeng Sia wrote:


-----Original Message-----
From: Samuel Holland <samuel.holland@xxxxxxxxxx>
Sent: Wednesday, March 13, 2024 3:57 AM
To: Palmer Dabbelt <palmer@xxxxxxxxxxx>; linux-riscv@xxxxxxxxxxxxxxxxxxx
Cc: Samuel Holland <samuel.holland@xxxxxxxxxx>; Albert Ou <aou@xxxxxxxxxxxxxxxxx>; Andrew Jones <ajones@xxxxxxxxxxxxxxxx>;
Conor Dooley <conor.dooley@xxxxxxxxxxxxx>; Leyfoon Tan <leyfoon.tan@xxxxxxxxxxxxxxxx>; Paul Walmsley
<paul.walmsley@xxxxxxxxxx>; Pavel Machek <pavel@xxxxxx>; Rafael J. Wysocki <rafael@xxxxxxxxxx>; JeeHeng Sia
<jeeheng.sia@xxxxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx
Subject: [PATCH] riscv: Do not save the scratch CSR during suspend

While the processor is executing kernel code, the value of the scratch
CSR is always zero, so there is no need to save the value. Continue to
write the CSR during the resume flow, so we do not rely on firmware to
initialize it.

Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx>
---

arch/riscv/include/asm/suspend.h | 1 -
arch/riscv/kernel/suspend.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 491296a335d0..6569eefacf38 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -13,7 +13,6 @@ struct suspend_context {
/* Saved and restored by low-level functions */
struct pt_regs regs;
/* Saved and restored by high-level functions */
- unsigned long scratch;
unsigned long envcfg;
unsigned long tvec;
unsigned long ie;
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
index 299795341e8a..3d306d8a253d 100644
--- a/arch/riscv/kernel/suspend.c
+++ b/arch/riscv/kernel/suspend.c
@@ -14,7 +14,6 @@

void suspend_save_csrs(struct suspend_context *context)
{
- context->scratch = csr_read(CSR_SCRATCH);
if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
context->envcfg = csr_read(CSR_ENVCFG);
context->tvec = csr_read(CSR_TVEC);
@@ -37,7 +36,7 @@ void suspend_save_csrs(struct suspend_context *context)

void suspend_restore_csrs(struct suspend_context *context)
{
- csr_write(CSR_SCRATCH, context->scratch);
+ csr_write(CSR_SCRATCH, 0);
If the register is always zero, do we need to explicitly write zero to the register during resume?

The register contains zero while executing in the kernel. While executing in
userspace, the value is nonzero. The value is checked at the beginning of
handle_exception(). We must ensure the value is zero before enabling interrupts,
or we might incorrectly think the interrupt was entered from userspace.

We don't know what the value will be when the hart comes out of non-retentive
suspend. Per the SBI HSM specification, Table 6: "All other registers remain in
an undefined state."

We're also not setting it at all in `.macro suspend_restore_csrs`, which I think is just a bug?

That said, I'm kind of seeing bugs everywhere I look in this now -- what about all the other registers we can poke, like timers/counters or the V/F state (or anything from M-mode, though maybe that's just someone else's problem)?

I also think we'd break on medlow kernels, as a bunch of this relies on medany-as-PIC for the SATP-off transition.

Maybe I'm going crazy here, though...

Regards,
Samuel