Re: [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece
From: Huang\, Ying
Date: Tue Oct 23 2018 - 23:31:54 EST
Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> writes:
> On Wed, Oct 10, 2018 at 03:19:03PM +0800, Huang Ying wrote:
>> And for all, Any comment is welcome!
>> This patchset is based on the 2018-10-3 head of mmotm/master.
> There seems to be some infrequent memory corruption with THPs that have been
> swapped out: page contents differ after swapin.
Thanks a lot for testing this! I know there were big effort behind this
and it definitely will improve the quality of the patchset greatly!
> Reproducer at the bottom. Part of some tests I'm writing, had to separate it a
> little hack-ily. Basically it writes the word offset _at_ each word offset in
> a memory blob, tries to push it to swap, and verifies the offset is the same
> after swapin.
> I ran with THP enabled=always. THP swapin_enabled could be always or never, it
> happened with both. Every time swapping occurred, a single THP-sized chunk in
> the middle of the blob had different offsets. Example:
> ** > word corruption gap
> ** corruption detected 14929920 bytes in (got 15179776, expected 14929920) **
> ** corruption detected 14929928 bytes in (got 15179784, expected 14929928) **
> ** corruption detected 14929936 bytes in (got 15179792, expected 14929936) **
> ...pattern continues...
> ** corruption detected 17027048 bytes in (got 15179752, expected 17027048) **
> ** corruption detected 17027056 bytes in (got 15179760, expected 17027056) **
> ** corruption detected 17027064 bytes in (got 15179768, expected 17027064) **
15179776 < 15179xxx <= 17027064
15179776 % 4096 = 0
And 15179776 = 15179768 + 8
So I guess we have some alignment bug. Could you try the patches
attached? It deal with some alignment issue.
> 100.0% of memory was swapped out at mincore time
> 0.00305% of pages were corrupted (first corrupt word 14929920, last corrupt word 17027064)
> The problem goes away with THP enabled=never, and I don't see it on 2018-10-3
> mmotm/master with THP enabled=always.
> The server had an NVMe swap device and ~760G memory over two nodes, and the
> program was always run like this: swap-verify -s $((64 * 2**30))
> The kernels had one extra patch, Alexander Duyck's
> "dma-direct: Fix return value of dma_direct_supported", which was required to
> get them to build.