Re: kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110
From: Hugh Dickins
Date: Sun Oct 16 2011 - 23:03:22 EST
I've not read through and digested Andrea's reply yet, but I'd say
this is not something we need to rush into 3.1 at the last moment,
before it's been fully considered: the bug here is hard to hit,
ancient, made more likely in 2.6.35 by compaction and in 2.6.38 by
THP's reliance on compaction, but not a regression in 3.1 at all - let
it wait until stable.
Hugh
On Sun, Oct 16, 2011 at 3:37 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> What's the status of this thing? Is it stable/3.1 material? Do we have
> ack/nak's for it? Anybody?
>
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Linus
>
> On Thu, Oct 13, 2011 at 4:16 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>>
>> [PATCH] mm: add anon_vma locking to mremap move
>>
>> I don't usually pay much attention to the stale "? " addresses in
>> stack backtraces, but this lucky report from Pawel Sikora hints that
>> mremap's move_ptes() has inadequate locking against page migration.
>>
>> Â3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
>> Âkernel BUG at include/linux/swapops.h:105!
>> ÂRIP: 0010:[<ffffffff81127b76>] Â[<ffffffff81127b76>]
>> Â Â Â Â Â Â Â Â Â Â Â migration_entry_wait+0x156/0x160
>> Â[<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
>> Â[<ffffffff810feee2>] ? __pte_alloc+0x42/0x120
>> Â[<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
>> Â[<ffffffff81102a31>] handle_mm_fault+0x181/0x310
>> Â[<ffffffff81106097>] ? vma_adjust+0x537/0x570
>> Â[<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
>> Â[<ffffffff81109a05>] ? do_mremap+0x2d5/0x570
>> Â[<ffffffff81421d5f>] page_fault+0x1f/0x30
>>
>> mremap's down_write of mmap_sem, together with i_mmap_mutex/lock,
>> and pagetable locks, were good enough before page migration (with its
>> requirement that every migration entry be found) came in; and enough
>> while migration always held mmap_sem. ÂBut not enough nowadays, when
>> there's memory hotremove and compaction: anon_vma lock is also needed,
>> to make sure a migration entry is not dodging around behind our back.
>>
>> It appears that Mel's a8bef8ff6ea1 "mm: migration: avoid race between
>> shift_arg_pages() and rmap_walk() during migration by not migrating
>> temporary stacks" was actually a workaround for this in the special
>> common case of exec's use of move_pagetables(); and we should probably
>> now remove that VM_STACK_INCOMPLETE_SETUP stuff as a separate cleanup.
>>
>> Reported-by: Pawel Sikora <pluto@xxxxxxxx>
>> Cc: stable@xxxxxxxxxx
>> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
>> ---
>>
>> Âmm/mremap.c | Â Â5 +++++
>> Â1 file changed, 5 insertions(+)
>>
>> --- 3.1-rc9/mm/mremap.c 2011-07-21 19:17:23.000000000 -0700
>> +++ linux/mm/mremap.c  2011-10-13 14:36:25.097780974 -0700
>> @@ -77,6 +77,7 @@ static void move_ptes(struct vm_area_str
>> Â Â Â Â Â Â Â Âunsigned long new_addr)
>> Â{
>> Â Â Â Âstruct address_space *mapping = NULL;
>> + Â Â Â struct anon_vma *anon_vma = vma->anon_vma;
>> Â Â Â Âstruct mm_struct *mm = vma->vm_mm;
>> Â Â Â Âpte_t *old_pte, *new_pte, pte;
>> Â Â Â Âspinlock_t *old_ptl, *new_ptl;
>> @@ -95,6 +96,8 @@ static void move_ptes(struct vm_area_str
>> Â Â Â Â Â Â Â Âmapping = vma->vm_file->f_mapping;
>> Â Â Â Â Â Â Â Âmutex_lock(&mapping->i_mmap_mutex);
>> Â Â Â Â}
>> + Â Â Â if (anon_vma)
>> + Â Â Â Â Â Â Â anon_vma_lock(anon_vma);
>>
>> Â Â Â Â/*
>> Â Â Â Â * We don't have to worry about the ordering of src and dst
>> @@ -121,6 +124,8 @@ static void move_ptes(struct vm_area_str
>> Â Â Â Â Â Â Â Âspin_unlock(new_ptl);
>> Â Â Â Âpte_unmap(new_pte - 1);
>> Â Â Â Âpte_unmap_unlock(old_pte - 1, old_ptl);
>> + Â Â Â if (anon_vma)
>> + Â Â Â Â Â Â Â anon_vma_unlock(anon_vma);
>> Â Â Â Âif (mapping)
>> Â Â Â Â Â Â Â Âmutex_unlock(&mapping->i_mmap_mutex);
>> Â Â Â Âmmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/