Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()

From: Richard Weinberger
Date: Sun May 04 2014 - 04:34:54 EST

Next message: Hongbo Zhang: "Re: [PATCH v4 7/8] DMA: Freescale: use spin_lock_bh instead of spin_lock_irqsave"
Previous message: Brilliantov Kirill Vladimirovich: "Re: [PATCH] staging/comedi: fix sparse warning: shift too big"
In reply to: Linus Torvalds: "Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()"
Next in thread: Linus Torvalds: "Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Linus,

Am 04.05.2014 01:57, schrieb Linus Torvalds:
> On Sat, May 3, 2014 at 4:37 PM, Richard Weinberger <richard@xxxxxx> wrote:
>> Commit 1cf35d47 (mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts)
>> accidently changed the behavior of the force_flush variable.
>
> No it didn't. There was nothing accidental about it, and it doesn't
> even change it the way you claim.
>
>> Before the patch it was set by __tlb_remove_page(). Now it is only set to 1
>> if __tlb_remove_page() returns false but never set back to 0 if __tlb_remove_page()
>> returns true.
>
> It starts out as zero. If __tlb_remove_page() returns true, it never
> gets set to anything *but* zero, except by the dirty shared mapping
> case that *needs* to set it to non-zero, exactly because it *needs* to
> flush the TLB before releasing the pte lock.
>
> Which was the whole point of the patch.
>
> Your explanation makes no sense for _another_ reason: even with your
> patch, it never gets set back to zero, since if it gets set to one you
> have that "break" in there. So the whole "gets set back to zero" is
> simply not relevant or true, with or with the patch.

Hmm, I got confused by:
if (PageAnon(page))
rss[MM_ANONPAGES]--;
else {
if (pte_dirty(ptent)) {
force_flush = 1;

Here you set force_flush.

set_page_dirty(page);
}
if (pte_young(ptent) &&
likely(!(vma->vm_flags & VM_SEQ_READ)))
mark_page_accessed(page);
rss[MM_FILEPAGES]--;
}
page_remove_rmap(page);
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
if (unlikely(!__tlb_remove_page(tlb, page))) {
force_flush = 1;
break;
}

And here it cannot get back to 0.

continue;

> The only place it actually gets zeroed (apart from initialization) is
> for the "goto again" case, which does it (and always did it)
>
>> Fixes BUG: Bad rss-counter state ...
>> and
>> kernel BUG at mm/filemap.c:202!
>
> So tell us more about those actual problems, because your patch and
> explanation is clearly wrong.
>
> What hardware, what load, what "kernel BUG at filemap.c:202"?

With your patch applied I see lots of BUG: Bad rss-counter state messages on UML (x86_32)
when fuzzing with trinity the mremap syscall.
And sometimes I face BUG at mm/filemap.c:202.

UML is here a bit special. It maps two pages into every process (the stub pages)
to issue mmap(), munmap() or mprotect() upon a page fault to fix memory mappings
for the faulting process on the host side.
It has to make sure that a guest process cannot mess with its stub pages.
Otherwise a guest could execute code on the host side.

Trinity manages to destroy these stub pages, UML detects this upon TLB handling
and kills the current process immediately.
After killing a trinity child I start observing the said issues.

e.g.
fix_range_common: failed, killing current process: 841
fix_range_common: failed, killing current process: 842
fix_range_common: failed, killing current process: 843
BUG: Bad rss-counter state mm:28e69600 idx:0 val:2

> The shared dirty fix may certainly be exposing some other issue, but
> the only report I have seen about filemap.c:202 was reported by Dave
> Jones ten *days* before the commit you talk about was even done.

Mea culpa, I've not noticed that fact.
Back to the drawing board...

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hongbo Zhang: "Re: [PATCH v4 7/8] DMA: Freescale: use spin_lock_bh instead of spin_lock_irqsave"
Previous message: Brilliantov Kirill Vladimirovich: "Re: [PATCH] staging/comedi: fix sparse warning: shift too big"
In reply to: Linus Torvalds: "Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()"
Next in thread: Linus Torvalds: "Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]