Re: [Qemu-devel] [RFH] qemu-2.6 memory corruption with OVMF and linux-4.9

From: Dr. David Alan Gilbert
Date: Sun Jun 18 2017 - 15:06:50 EST


* Philipp Hahn (hahn@xxxxxxxxxxxxx) wrote:
> Hello,
>
> Am 17.06.2017 um 18:51 schrieb Laszlo Ersek:
> > (I also recommend using the "vbindiff" tool for such problems, it is
> > great for picking out patterns.)
> >
> > ** ** ** ** ** ** ** ** 8 9 ** ** ** 13 14 15
> > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> > 00000000 01 e8 00 00 00 00 00 00 8c 5e 00 00 00 10 ff f1
> > 00000010 5b 78 8a 3e 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000020 8c 77 00 00 00 12 00 02 18 f0 00 00 00 00 00 00
> > 00000030 00 1e 00 00 00 00 00 00 8c 8c 00 00 00 12 00 02
> > 00000040 07 70 00 00 00 00 00 00 00 14 00 00 00 00 00 00
> > 00000050 8c 9c 00 00 00 12 00 02 22 00 00 00 00 00 00 00
> > 00000060 00 40 00 00 00 00 00 00 8c ac 00 00 00 10 ff f1
> >
> > 00000000 01 e8 00 00 00 00 00 00 00 3c 00 00 00 17 00 00
> > 00000010 5b 78 8a 3e 00 00 00 00 00 3c 00 00 00 07 00 00
> > 00000020 8c 77 00 00 00 12 00 02 00 3c 00 00 00 07 00 00
> > 00000030 00 1e 00 00 00 00 00 00 00 3c 00 00 00 17 00 00
> > 00000040 07 70 00 00 00 00 00 00 00 3c 00 00 00 07 00 00
> > 00000050 8c 9c 00 00 00 12 00 02 00 3c 00 00 00 07 00 00
> > 00000060 00 40 00 00 00 00 00 00 00 3c 00 00 00 17 00 00
> > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> > ** ** ** ** ** ** ** ** 8 9 ** ** ** 13 14 15
> >
> > The columns that I marked with "**" are identical between "good" and
> > "bad". (These are columns 0-7, 10-12.)
> >
> > Column 8 is overwritten by zeros (every 16th byte).
> >
> > Column 9 is overwritten by 0x3c (every 16th byte).
> >
> > Column 13 is super interesting. The most significant nibble in that
> > column is not disturbed. And, in the least significant nibble, the least
> > significant three bits are turned on. Basically, the corruption could be
> > described, for this column (i.e., every 16th byte), as
> >
> > bad = good | 0x7
> >
> > Column 14 is overwritten by zeros (every 16th byte).
> >
> > Column 15 is overwritten by zeros (every 16th byte).
> >
> > My take is that your host machine has faulty RAM. Please run memtest86+
> > or something similar.
>
> I will do so, but for me very unlikely:
> - it never happens with BIOS, only with OVMF
> - for each test I start q new QEMU process, which should use a different
> memory region
> - it repeatedly hits e1000 or libata.ko
>
> After updating from OVMF to 0~20161202.7bbe0b3e-1 from
> (0~20160813.de74668f-2 it has not yet happened again.
>
> Anyway, thank you for your help.

What host CPU are you using?

Dave

>
> Philipp
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/