Re: [PATCH 1/1] riscv: dts: microchip: correct L2 cache interrupts
From: Conor.Dooley
Date: Thu Aug 18 2022 - 04:34:11 EST
On 18/08/2022 09:17, Heinrich Schuchardt wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 8/18/22 09:03, Daire.McNamara@xxxxxxxxxxxxx wrote:
>> On Wed, 2022-08-17 at 18:04 +0000, Conor Dooley - M52691 wrote:
>>> Hey Heinrich,
>>> Interesting CC list you got there! Surprised the mailmap didn't sort
>>> out Atish & Krzysztof's addresses, but I think I've fixed them up.
>>> I see Daire isn't there either so +CC him too.
>>>
>>> On 17/08/2022 14:25, Heinrich Schuchardt wrote:
>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>>>> know the content is safe
>>>>
>>>> The "PolarFire SoC MSS Technical Reference Manual" documents the
>>>> following PLIC interrupts:
>>>>
>>>> 1 - L2 Cache Controller Signals when a metadata correction event
>>>> occurs
>>>> 2 - L2 Cache Controller Signals when an uncorrectable metadata
>>>> event occurs
>>>> 3 - L2 Cache Controller Signals when a data correction event occurs
>>>> 4 - L2 Cache Controller Signals when an uncorrectable data event
>>>> occurs
>>>>
>>>> This differs from the SiFive FU540 which only has three L2 cache
>>>> related
>>>> interrupts.
>>>>
>>>> The sequence in the device tree is defined by an enum:
>> in drivers/soc/sifive/sifive_l2_cache.c
>>>>
>>>> enum {
>>>> DIR_CORR = 0,
>>>> DATA_CORR,
>>>> DATA_UNCORR,
>>>> DIR_UNCORR,
>>>> };
>>>
>>> Nit: more accurately by the dt-binding:
>>> interrupts:
>>> minItems: 3
>>> items:
>>> - description: DirError interrupt
>>> - description: DataError interrupt
>>> - description: DataFail interrupt
>>> - description: DirFail interrupt
>>>
>>> I do find the names in the enum to be a bit more understandable
>>> however,
>>> and ditto for the descriptions in our TRM... Maybe I should put that
>>> on
>>> my todo list of cleanups :)
>>>
>>>
>>>> So the correct sequence of the L2 cache interrupts is
>>>>
>>>> interrupts = <1>, <3>, <4>, <2>;
>>>
>>> This looks correct to me. You mentioned on IRC that what you were
>>> seeing
>>> was a wall of
>>> L2CACHE: DataFail @ 0x00000000.0807FFD8
>>> From a quick look at the driver, what seems to be happening here is
>>> that
>>> at some point (possibly before Linux even comes into the picture)
>>> there
>>> is an uncorrectable data error. Because the ordering in the dt is
>>> wrong,
>>> we read the wrong register and so the interrupt is never actually
>>> cleared. With this patch applied, I see a single DataFail right as
>>> the
>>> interrupt gets registed & nothing after that.
>>>
>>> I am not really sure what value there is in enabling that driver
>>> though,
>>> mostly just seems like a debugging tool & from our pov we would see
>>> the
>>> HSS running in the monitor core as being responsible for handling the
>>> l2-cache errors.
>>>
>>> @Daire, maybe you have an opinion here?
>> Likewise. The new ordering of the interrupts to what the driver expects
>> looks correct - as far as it goes. However, I'm not convinced enabling
>> the SiFive l2 cache driver out of the box makes sense. Using l2 cache
>> driver doesn't align terribly well with the current MPFS roadmap for
>> mgt of ECC errors.
>>>
>>> Patch LGTM, so I'll likely apply it in the next day or two, would
>>> just
>>> like to see what Daire has to say first.
>> If l2-cache controller is enabled, then interrupts should be connected
>> as per TRM. I think this specific patch lgtm, ideally with a
>> 'disabled' stanza and it's up to individual MPFS customers/boards to
>> enable l2 cache controller if they want it.
>
> Disabling the device in the device-tree is orthogonal to fixing the
> interrupt sequence. I would suggest that you use a separate patch for
> adding status = "disabled";.
Aye, not wrong there. At least from me, it was an observation on the
way you discovered that the bug existed. I'll apply this patch today
so - thanks for fixing it!
Conor.
>
> Best regards
>
> Heinrich
>
>>>
>>>> Fixes: e35b07a7df9b ("riscv: dts: microchip: mpfs: Group tuples in
>>>> interrupt properties")
>>>
>>> BTW, it isn't really fixing this patch right? This is a dependency
>>> for
>>> backports to 5.15.
>>>
>>> Thanks for your patch,
>>> Conor.
>>>
>>>> Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE
>>>> board")
>>>> Cc: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>> Signed-off-by: Heinrich Schuchardt <
>>>> heinrich.schuchardt@xxxxxxxxxxxxx>
>>>> ---
>>>> arch/riscv/boot/dts/microchip/mpfs.dtsi | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> b/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> index 496d3b7642bd..ec1de6344be9 100644
>>>> --- a/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> +++ b/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> @@ -169,7 +169,7 @@ cctrllr: cache-controller@2010000 {
>>>> cache-size = <2097152>;
>>>> cache-unified;
>>>> interrupt-parent = <&plic>;
>>>> - interrupts = <1>, <2>, <3>;
>>>> + interrupts = <1>, <3>, <4>, <2>;
>>>> };
>>>>
>>>> clint: clint@2000000 {
>>>> --
>>>> 2.36.1
>>>>
>