Re: XFS memory allocation deadlock in 2.6.38

From: Christoph Hellwig
Date: Thu Mar 24 2011 - 13:43:18 EST


Michel,

can you take a look at this bug report? It looks like a regression
in your mlock handling changes.


On Wed, Mar 23, 2011 at 03:39:05PM -0400, Sean Noonan wrote:
> I believe this patch fixes the behavior:
> diff --git a/mm/memory.c b/mm/memory.c
> index e48945a..740d5ab 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3461,7 +3461,9 @@ int make_pages_present(unsigned long addr, unsigned long end)
> * to break COW, except for shared mappings because these don't COW
> * and we would not want to dirty them for nothing.
> */
> - write = (vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE;
> + write = (vma->vm_flags & VM_WRITE) != 0;
> + if (write && ((vma->vm_flags & VM_SHARED) !=0) && (vma->vm_file == NULL))
> + write = 0;
> BUG_ON(addr >= end);
> BUG_ON(end > vma->vm_end);
> len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;
>
>
> This was traced to the following commit:
> 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 is the first bad commit
> commit 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272
> Author: Michel Lespinasse <walken@xxxxxxxxxx>
> Date: Thu Jan 13 15:46:09 2011 -0800
>
> mlock: avoid dirtying pages and triggering writeback
>
> When faulting in pages for mlock(), we want to break COW for anonymous or
> file pages within VM_WRITABLE, non-VM_SHARED vmas. However, there is no
> need to write-fault into VM_SHARED vmas since shared file pages can be
> mlocked first and dirtied later, when/if they actually get written to.
> Skipping the write fault is desirable, as we don't want to unnecessarily
> cause these pages to be dirtied and queued for writeback.
>
> Signed-off-by: Michel Lespinasse <walken@xxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
> Cc: Kosaki Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Nick Piggin <npiggin@xxxxxxxxx>
> Cc: Theodore Tso <tytso@xxxxxxxxxx>
> Cc: Michael Rubin <mrubin@xxxxxxxxxx>
> Cc: Suleiman Souhlal <suleiman@xxxxxxxxxx>
> Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>
> :040000 040000 604eede2f45b7e5276ce9725b715ed15a868861d 3c175eadf4cf33d4f78d4d455c9a04f3df2c199e M mm
>
>
> -----Original Message-----
> From: Sean Noonan
> Sent: Monday, March 21, 2011 12:20
> To: 'linux-kernel@xxxxxxxxxxxxxxx'
> Cc: Trammell Hudson; Martin Bligh; Stephen Degler; Christos Zoulas
> Subject: XFS memory allocation deadlock in 2.6.38
>
> This message was originally posted to the XFS mailing list, but received no responses. Thus, I am sending it to LKML on the advice of Martin.
>
> Using the attached program, we are able to reproduce this bug reliably.
> $ make vmtest
> $ ./vmtest /xfs/hugefile.dat $(( 16 * 1024 * 1024 * 1024 )) # vmtest <path_to_file> <size_in_bytes>
> /xfs/hugefile.dat: mapped 17179869184 bytes in 33822066943 ticks
> 749660: avg 13339 max 234667 ticks
> 371945: avg 26885 max 281616 ticks
> ---
> At this point, we see the following on the console:
> [593492.694806] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
> [593506.724367] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
> [593524.837717] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
> [593556.742386] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
>
> This is the same message presented in
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=410
>
> We started testing with 2.6.38-rc7 and have seen this bug through to the .0 release. This does not appear to be present in 2.6.33, but we have not done testing in between. We have tested with ext4 and do not encounter this bug.
> CONFIG_XFS_FS=y
> CONFIG_XFS_QUOTA=y
> CONFIG_XFS_POSIX_ACL=y
> CONFIG_XFS_RT=y
> # CONFIG_XFS_DEBUG is not set
> # CONFIG_VXFS_FS is not set
>
> Here is the stack from the process:
> [<ffffffff81357553>] call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff812ddf1e>] xfs_ilock+0x7e/0x110
> [<ffffffff8130132f>] __xfs_get_blocks+0x8f/0x4e0
> [<ffffffff813017b1>] xfs_get_blocks+0x11/0x20
> [<ffffffff8114ba3e>] __block_write_begin+0x1ee/0x5b0
> [<ffffffff8114be9d>] block_page_mkwrite+0x9d/0xf0
> [<ffffffff81307e05>] xfs_vm_page_mkwrite+0x15/0x20
> [<ffffffff810f2ddb>] do_wp_page+0x54b/0x820
> [<ffffffff810f347c>] handle_pte_fault+0x3cc/0x820
> [<ffffffff810f5145>] handle_mm_fault+0x175/0x2f0
> [<ffffffff8102e399>] do_page_fault+0x159/0x470
> [<ffffffff816cf6cf>] page_fault+0x1f/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> # uname -a
> Linux testhost 2.6.38 #2 SMP PREEMPT Fri Mar 18 15:00:59 GMT 2011 x86_64 GNU/Linux
>
> Please let me know if additional information is required.
>
> Thanks!
>
> Sean
>
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/