Re: [PATCH v2] irqchip/sifive-plic: Fix insufficient irq_groups allocation
From: Yangyu Chen
Date: Tue Feb 03 2026 - 01:37:50 EST
On 27/1/2026 23:58, Thomas Gleixner wrote:
On Wed, Jan 21 2026 at 13:45, Yangyu Chen wrote:
Since the first irq source is 1 instead of 0, when the number of
irqs is multiple of 32, the last irq group will be ignored during
allocation, saving, and restoring. This lead to memory corruption
when accessing enable_save beyond allocated memory after commit
14ff9e54dd14 ("irqchip/sifive-plic: Cache the interrupt enable state")
which will access enable_save for all sources during plic_probe.
Thus, we should allocate irq_groups based on (nr_irqs + 1) instead of
nr_irqs to avoid this issue.
This is an long standing bug since Linux v6.4 but since the last irq
source is rarely used, it may not be triggered in practice until commit
14ff9e54dd14 ("irqchip/sifive-plic: Cache the interrupt enable state").
I'm absolutely not convinced that this is the right fix. The handling of
nr_irqs in this driver is completely inconsistent:
1)
static int plic_irq_suspend(void *data)
/* irq ID 0 is reserved */
for (unsigned int i = 1; i < priv->nr_irqs; i++) {
2)
static void plic_irq_resume(void *data)
/* irq ID 0 is reserved */
for (unsigned int i = 1; i < priv->nr_irqs; i++) {
...
for (i = 0; i < DIV_ROUND_UP(priv->nr_irqs, 32); i++) {
3)
static irq_hw_number_t cp100_get_hwirq(struct plic_handler *handler, void __iomem *claim)
int nr_irq_groups = DIV_ROUND_UP(handler->priv->nr_irqs, 32);
for (i = 0; i < nr_irq_groups; i++) {
4)
static int plic_probe(struct fwnode_handle *fwnode)
priv->prio_save = bitmap_zalloc(nr_irqs, GFP_KERNEL);
...
for (int j = 0; j <= nr_irqs / 32; j++)
writel(0, enable_base + j);
...
handler->enable_save = kcalloc(DIV_ROUND_UP(nr_irqs, 32),...
...
for (hwirq = 1; hwirq <= nr_irqs; hwirq++) {
...
priv->irqdomain = irq_domain_create_linear(fwnode, nr_irqs + 1,
So can the SIFIVE people please clarify once and forever what's the
actual meaning of nr_irqs:
A) The actual number of hardware interrupts available for drivers
starting from hwirq=1
B) The total number of hardware interrupts, where hwirq=0 is reserved
and therefore the number of interrupts available to drivers is
nr_irqs - 1.
That needs to be clarified first and then subsequently every single
instance which deals with nr_irqs has to be updated accordingly.
Totally agree! But there is no comment after 1 week.
There's also some consideration of the actual device tree. I discovered that the Canaan K210 has only 64 interrupt sources (excluding 0), but its device tree (dts) has PLIC with ndev=65. [1] On the other hand, the ESWIN EIC7700 has 520 interrupt sources, but its dts has ndev=520. [2] This indicates that existing device trees already have some issues. Fortunately, other platforms like the cv1800b [3] have ndev=101 and the source ID 101 is used for mbox. Popular platforms like the JH7110 [4] and SG2042 [5] also show additional interrupts in their device trees that the SoC doesn’t actually use. In my opinion, the best practice would be option A and:
1. Note that the "riscv,ndev" value for PLIC represents the actual number of hardware interrupts available for drivers in the dt-binding. (This aligns with your option A.)
2. Fix all existing drivers to meet option A.
3. Since the K210 is only used in M-Mode with its built-in dts, we can modify its dts to change the "riscv,ndev" value from 65 to 64.
I will make such changes for this patch revision if there are no objections.
[1] https://cdn.hackaday.io/files/1654127076987008/kendryte_datasheet_20181011163248_en.pdf
[2] https://github.com/eswincomputing/EIC7700X-SoC-Technical-Reference-Manual/releases/download/v1.0.0-20250103/EIC7700X_SoC_Technical_Reference_Manual_Part1.pdf
[3] https://github.com/milkv-duo/duo-files/blob/main/duo/datasheet/CV1800B-CV1801B-Preliminary-Datasheet-full-en.pdf
[4] https://doc-en.rvspace.org/JH7110/PDF/JH7110_Datasheet.pdf
[5] https://github.com/milkv-pioneer/pioneer-files/blob/main/hardware/SG2042-TRM.pdf
Thanks,
Yangyu Chen
A random fix which cures the symptom of an out of bounds memory access
is not helpful at all to cure the underlying root cause of
inconsistency.
Thanks,
tglx