Re: [PATCH 2/2] riscv: fix the IPI missing issue in nommu mode
From: Palmer Dabbelt
Date:  Wed Mar 18 2020 - 21:45:17 EST
On Tue, 03 Mar 2020 01:34:18 PST (-0800), greentime.hu@xxxxxxxxxx wrote:
This patch fixes the IPI(inner processor interrupt) missing issue. It
failed because it used hartid_mask to iterate for_each_cpu(), however the
cpu_mask and hartid_mask may not be always the same. It will never send the
IPI to hartid 4 because it will be skipped in for_each_cpu loop in my case.
We can reproduce this case in Qemu sifive_u machine by this command.
qemu-system-riscv64 -nographic -smp 5 -m 1G -M sifive_u -kernel \
arch/riscv/boot/loader
It will hang in csd_lock_wait(csd) because the csd_unlock(csd) is not
called. It is not called because hartid 4 doesn't receive the IPI to
release this lock. The caller hart doesn't send the IPI to hartid 4 is
because of hartid 4 is skipped in for_each_cpu(). It will be skipped is
because "(cpu) < nr_cpu_ids" is not true. The hartid is 4 and nr_cpu_ids
is 4. Therefore it should use cpumask in for_each_cpu() instead of
hartid_mask.
        /* Send a message to all CPUs in the map */
        arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
        if (wait) {
                for_each_cpu(cpu, cfd->cpumask) {
                        call_single_data_t *csd;
			csd = per_cpu_ptr(cfd->csd, cpu);
                        csd_lock_wait(csd);
                }
        }
        for ((cpu) = -1;                                \
                (cpu) = cpumask_next((cpu), (mask)),    \
                (cpu) < nr_cpu_ids;)
It could boot to login console after this patch applied.
Fixes: b2d36b5668f6 ("riscv: provide native clint access for M-mode")
Signed-off-by: Greentime Hu <greentime.hu@xxxxxxxxxx>
---
 arch/riscv/include/asm/clint.h | 8 ++++----
 arch/riscv/kernel/smp.c        | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/riscv/include/asm/clint.h b/arch/riscv/include/asm/clint.h
index 6eaa2eedd694..a279b17a6aad 100644
--- a/arch/riscv/include/asm/clint.h
+++ b/arch/riscv/include/asm/clint.h
@@ -15,12 +15,12 @@ static inline void clint_send_ipi_single(unsigned long hartid)
 	writel(1, clint_ipi_base + hartid);
 }
-static inline void clint_send_ipi_mask(const struct cpumask *hartid_mask)
+static inline void clint_send_ipi_mask(const struct cpumask *mask)
 {
-	int hartid;
+	int cpu;
-	for_each_cpu(hartid, hartid_mask)
-		clint_send_ipi_single(hartid);
+	for_each_cpu(cpu, mask)
+		clint_send_ipi_single(cpuid_to_hartid_map(cpu));
 }
 static inline void clint_clear_ipi(unsigned long hartid)
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index eb878abcaaf8..e0a6293093f1 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -96,7 +96,7 @@ static void send_ipi_mask(const struct cpumask *mask, enum ipi_message_type op)
 	if (IS_ENABLED(CONFIG_RISCV_SBI))
 		sbi_send_ipi(cpumask_bits(&hartid_mask));
 	else
-		clint_send_ipi_mask(&hartid_mask);
+		clint_send_ipi_mask(mask);
 }
 static void send_ipi_single(int cpu, enum ipi_message_type op)
Thanks.  We should really stop putting hart IDs in cpumasks, as that's just
nonsense.
Reviewed-by: Palmer Dabbelt <palmerdabbelt@xxxxxxxxxx>
I'm taking these both onto fixes.