RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500
From: Hongtao Jia
Date: Tue Oct 07 2014 - 23:26:20 EST
> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, October 01, 2014 8:44 AM
> To: Guenter Roeck
> Cc: Jojy Varghese; Benjamin Herrenschmidt; Paul Mackerras; Michael
> Ellerman; linuxppc-dev@xxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> Guenter Roeck; Jia Hongtao-B38951
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
>
> On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
> > On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> > > On Mon, 2014-09-29 at 23:03 +0000, Jojy Varghese wrote:
> > > >
> > > > On 9/29/14 12:06 PM, "Guenter Roeck" <linux@xxxxxxxxxxxx> wrote:
> > > >
> > > > >Those are errors related to PCIe hotplug, and are seen with
> > > > >unexpected PCIe device removals (triggered, for example, by
> > > > >removing power from a PCIe adapter).
> > > > >The behavior we see on E5500 is quite similar to the same
> > > > >behavior on
> > > > >E500:
> > > > >If unhandled, the CPU keeps executing the same instruction over
> > > > >and over again if there is an error on a PCIe access and thus
> > > > >stalls. I don't know if this is considered an erratum or expected
> > > > >behavior, but it is one we have to address since we have to be
> > > > >able to handle that condition.
> > >
> > > The reason I ask is that the handling for e500 was described as an
> > > erratum workaround. If it is an erratum it would be nice to know
> > > the erratum number and the full list of affected chips.
> > >
> > My understanding, which may be wrong, was that this is expected
> > behavior, at least for E5500. I actually thought I had seen it
> > somewhere in the specification (response to PCIe errors), but I don't
> recall where exactly.
> >
> > At least for my part I am not aware of an erratum.
>
> Jia Hongtao, can you comment here?
I did not find any related erratum either.
>
> > > > >Ultimately, we'll want
> > > > >to
> > > > >implement PCIe error handlers for the affected drivers, but that
> > > > >will be a next step.
> > >
> > > For now can we at least print a ratelimited error message? I don't
> > > like the idea of silently ignoring these errors. I suppose it's a
> > > separate issue from extending the workaround to cover e500mc, though.
> > >
> > I don't really like the idea of printing an error message pretty much
> > each time when an unexpected hotplug event occurs.
>
> Unexpected events seem like the sort of thing you'd want to log, but my
> concern is that this might not be the only cause of PCI errors.
>
> -Scott
>