Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
From: bhe@xxxxxxxxxx
Date: Tue Nov 03 2020 - 09:04:55 EST
On 11/03/20 at 12:34pm, Rahul Gopakumar wrote:
> >> So, you mean with the draft patch applied, the initial performance
> regression goes away, just many page corruption errors with call trace
> are seen, right?
>
> Yes, that's right.
>
> >> And the performance regression is about 2sec delay in
> your system?
>
> The delay due to this new page corruption issue is about
> 3 secs.
>
> Here is the summary
>
> * Initial problem - 2 secs
> * Draft patch - Fixes initial problem (recovers 2 secs) but
> brings in new page corruption issue (3 secs)
>
> >> Could you tell how you setup vmware VM so that I can ask our QA for
> help to create a vmware VM for me to test?
>
> * Use vSphere ESXi 6.7 or 7.0 GA.
> * Create VM using vSphere Web Client and specify 1TB VM Memory.
> * Install RHEL 8.1, that's the guest used in this test.
OK, I see. The draft patch fix the original issue, seems some boundary
of memory region is not handled correctly. Thanks for confirmation.
The memory layout is important in this case. Not sure if making a VM gesut
as you suggested can also create a system with below memory layout.
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
>
> With draft patch, you should be able to reproduce the issue.
> Let me know if you need more details.