Re: [Qemu-devel] [RFH] qemu-2.6 memory corruption with OVMF and linux-4.9

From: Philipp Hahn
Date: Sun Jun 18 2017 - 15:54:33 EST


Am 18.06.2017 um 20:27 schrieb Dr. David Alan Gilbert:
> * Philipp Hahn (hahn@xxxxxxxxxxxxx) wrote:
>> Hello,
>>
>> Am 17.06.2017 um 18:51 schrieb Laszlo Ersek:
>>> (I also recommend using the "vbindiff" tool for such problems, it is
>>> great for picking out patterns.)
>>>
>>> ** ** ** ** ** ** ** ** 8 9 ** ** ** 13 14 15
>>> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
>>> 00000000 01 e8 00 00 00 00 00 00 8c 5e 00 00 00 10 ff f1
>>> 00000010 5b 78 8a 3e 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00000020 8c 77 00 00 00 12 00 02 18 f0 00 00 00 00 00 00
>>> 00000030 00 1e 00 00 00 00 00 00 8c 8c 00 00 00 12 00 02
>>> 00000040 07 70 00 00 00 00 00 00 00 14 00 00 00 00 00 00
>>> 00000050 8c 9c 00 00 00 12 00 02 22 00 00 00 00 00 00 00
>>> 00000060 00 40 00 00 00 00 00 00 8c ac 00 00 00 10 ff f1
>>>
>>> 00000000 01 e8 00 00 00 00 00 00 00 3c 00 00 00 17 00 00
>>> 00000010 5b 78 8a 3e 00 00 00 00 00 3c 00 00 00 07 00 00
>>> 00000020 8c 77 00 00 00 12 00 02 00 3c 00 00 00 07 00 00
>>> 00000030 00 1e 00 00 00 00 00 00 00 3c 00 00 00 17 00 00
>>> 00000040 07 70 00 00 00 00 00 00 00 3c 00 00 00 07 00 00
>>> 00000050 8c 9c 00 00 00 12 00 02 00 3c 00 00 00 07 00 00
>>> 00000060 00 40 00 00 00 00 00 00 00 3c 00 00 00 17 00 00
>>> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
>>> ** ** ** ** ** ** ** ** 8 9 ** ** ** 13 14 15
>>>
>>> The columns that I marked with "**" are identical between "good" and
>>> "bad". (These are columns 0-7, 10-12.)
>>>
>>> Column 8 is overwritten by zeros (every 16th byte).
>>>
>>> Column 9 is overwritten by 0x3c (every 16th byte).
>>>
>>> Column 13 is super interesting. The most significant nibble in that
>>> column is not disturbed. And, in the least significant nibble, the least
>>> significant three bits are turned on. Basically, the corruption could be
>>> described, for this column (i.e., every 16th byte), as
>>>
>>> bad = good | 0x7
>>>
>>> Column 14 is overwritten by zeros (every 16th byte).
>>>
>>> Column 15 is overwritten by zeros (every 16th byte).
>>>
>>> My take is that your host machine has faulty RAM. Please run memtest86+
>>> or something similar.
>>
>> I will do so, but for me very unlikely:
>> - it never happens with BIOS, only with OVMF
>> - for each test I start q new QEMU process, which should use a different
>> memory region
>> - it repeatedly hits e1000 or libata.ko
>>
>> After updating from OVMF to 0~20161202.7bbe0b3e-1 from
>> (0~20160813.de74668f-2 it has not yet happened again.
>>
>> Anyway, thank you for your help.
>
> What host CPU are you using?

Everything is amd64:
> processor : 3
> vendor_id : GenuineIntel
> cpu family : 6
> model : 58
> model name : Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz
> stepping : 9
> microcode : 0x19
> cpu MHz : 2591.015
> cache size : 3072 KB
> physical id : 0
> siblings : 4
> core id : 1
> cpu cores : 2
> apicid : 3
> initial apicid : 3
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
> bugs :
> bogomips : 3592.75
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management:

Philipp