Re: [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen

From: Markus Trippelsdorf
Date: Mon Jul 31 2017 - 08:22:46 EST


On 2017.07.31 at 13:04 +0100, Alan Cox wrote:
> On Wed, 26 Jul 2017 06:54:01 +0900
> Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx> wrote:
>
> > # I'm a LKML subscriber, but not a x86 list subscriber
> >
> > I found the following new linux kernel bugzilla about Ryzen related problem.
> > Since many developers don't check this bugzilla and I've also
> > encountered this problem,
> > I decided to introduce this problem here.
>
> Historically we've seen exactly these symptoms on all kinds of systems
> where the memory is at fault, even in cases where memtest86 passes.
> Whether there's a specific problem on some Ryzen boards is a question for
> AMD, but if I saw this without knowing the CPU I'd suspect memory
> firstly. GCC it turns out is by accident an amazingly effective memory
> testing tool.
>
> If it is memory corruption problems then no - the kernel cannot work
> around that level of hardware failure. The BIOS may be able to if it is a
> board or compatibility problem as memory tuning is usually done by the
> BIOS.

People are seeing these segfaults even with ECC memory (and EDAC
enabled). There are no ECC related MCEs in their logs.

Also for some the segfaults are gone after they RMAed their CPU.
Others are not so lucky and they still see segfaults after RMA.

For me it looks like a chip binning issue

--
Markus