Re: [Bisected] altera_edac crash on a system without ECC

From: Aaro Koskinen
Date: Wed Dec 04 2019 - 08:25:47 EST


Hi,

On Tue, Dec 03, 2019 at 05:09:48PM -0600, Thor Thayer wrote:
> On 11/29/19 10:57 AM, Aaro Koskinen wrote:
> >Hi,
> >
> >I tried booting v5.4 mainline kernel on a stratix10 board with ECC
> >disabled, and the altera-edac driver (with only SDRAM enabled) is
> >now crashing the system instead of failing the probe with "No ECC/ECC
> >disabled".
> >
> I apologize for the late reply. I was on vacation.
>
> ECC disabled means the sof/jic that you're loading has ECC disabled,
> correct?

Yes.

> >This seems to have started with commit 08f08bfb7b4c ("EDAC, altera:
> >Merge Stratix10 into the Arria10 SDRAM probe routine"). With the change,
> >looks like sdram probe no longer uses SMC calls and instead accesses
> >the registers directly. The crash looks like this:
>
> I haven't seen this. I'd expect both ECC enabled and disabled to fail with
> the dumps you have below since they'd both need to use the regmap functions.

With ECC enabled it doesn't fail, as the direct register access appears
to work then (I also checked by reading 0xf8011101 from userspace -
and it works without an abort).

> Yes, this does look like it is using the register accesses instead of the
> SMC call. Line 2206 sets the SMC call after determining from the if()
> statement if it is a Stratix10 or Arria10 and from below it seems to take
> the Arria10 path.

But that's setting the ecc_mgr_map. I think that altr_check_ecc_deps
and altr_sdram_probe use a different mapping. Before commit 08f08bfb7b4c
there was S10 specific altr_s10_sdram_probe() that took care of the SMC,
but I cannot see how the current code doing that unless I'm missing some
special magic.

> The altr_check_ecc_deps() call is checking whether ECC is enabled so the
> probe should fail.
>
> I suspect the device tree. Can you verify the following node is in your
> device tree?
>
> sdramedac {
> compatible = "altr,sdram-edac-s10";
> altr,sdr-syscon = <&sdr>;
> interrupts = <16 4>;
> };

Yes, I'm using the in-tree socfpga_stratix10.dtsi.

A.