Re: Linux 3.19-rc3

From: Will Deacon
Date: Mon Jan 12 2015 - 07:43:12 EST

Next message: Masami Hiramatsu: "Re: Re: livepatching tree for linux-next"
Previous message: Thomas Graf: "Re: Fwd: [rhashtable] WARNING: CPU: 0 PID: 10 at kernel/locking/mutex.c:570 mutex_lock_nested()"
In reply to: Linus Torvalds: "Re: Linux 3.19-rc3"
Next in thread: Mark Langsdorf: "Re: Linux 3.19-rc3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Jan 10, 2015 at 07:51:03PM +0000, Linus Torvalds wrote:
> On Sat, Jan 10, 2015 at 5:37 AM, Will Deacon <will.deacon@xxxxxxx> wrote:
> >>
> >> Will?
> >
> > I'm wondering if this is now broken in the fullmm case, because tlb->end
> > will be zero and we won't actually free any of the pages being unmapped
> > on task exit. Does that sound plausible?
>
> But did anything change wrt fullmm? I don't see any changes wrt fullmm
> logic in generic code.

No, and now that I've had a play, the issue is worse than I thought.

> The arm64 code changed more, so maybe there was somethinig I missed.
> Again, arm64 uses tlb_end_vma() etc, so arm64 certainly triggers code
> that x86 does not.

Yup, and I think tlb_end_vma is also problematic. The issue is that we
reset the range in __tlb_end_vma, and therefore the batched pages don't
get freed. We also have an issue in the fullmm case, because
tlb_flush_mmu tests tlb->end != 0 before doing any freeing.

The fundamental problem is that tlb->end *only* indicates whether or not
TLB invalidation is required, *not* whether there are pages to be freed.
The advantage of the old need_flush flag was that it was sticky over
calls to things like tlb_flush_mmu_tlbonly.

> I can revert the commit that causes problems, but considering the
> performance impact on x86 (it would be a regression since 3.18), I
> would *really* like to fix the arm64 problem instead. So I'll wait
> with the revert for at least a week, I think, hoping that the arm64
> people figure this out. Sound reasonable?

Sure. I've put together the following patch which:

(1) Uses tlb->end != 0 only as the predicate for TLB invalidation

(2) Uses tlb->local.nr as the predicate for page freeing in
tlb_flush_mmu_free

(3) Ensures tlb->end != 0 for the fullmm case (I think this was a
benign change introduced by my original patch, but this puts us
back to 3.18 behaviour)

Since this effectively reverts f045bbb9fa1b, I've added Dave to Cc. It
should fix the leak without reintroducing the performance regression.

Linus: this moves the tlb->end check back into tlb_flush_mmu_tlbonly.
How much do you hate that?

Will

--->8

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 08848050922e..db284bff29dc 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -136,8 +136,12 @@ static inline void __tlb_adjust_range(struct mmu_gather *tlb,

static inline void __tlb_reset_range(struct mmu_gather *tlb)
{
- tlb->start = TASK_SIZE;
- tlb->end = 0;
+ if (tlb->fullmm) {
+ tlb->start = tlb->end = ~0;
+ } else {
+ tlb->start = TASK_SIZE;
+ tlb->end = 0;
+ }
}

/*
diff --git a/mm/memory.c b/mm/memory.c
index c6565f00fb38..54f3a9b00956 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -235,6 +235,9 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long

static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
{
+ if (!tlb->end)
+ return;
+
tlb_flush(tlb);
mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
@@ -247,7 +250,7 @@ static void tlb_flush_mmu_free(struct mmu_gather *tlb)
{
struct mmu_gather_batch *batch;

- for (batch = &tlb->local; batch; batch = batch->next) {
+ for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
free_pages_and_swap_cache(batch->pages, batch->nr);
batch->nr = 0;
}
@@ -256,9 +259,6 @@ static void tlb_flush_mmu_free(struct mmu_gather *tlb)

void tlb_flush_mmu(struct mmu_gather *tlb)
{
- if (!tlb->end)
- return;
-
tlb_flush_mmu_tlbonly(tlb);
tlb_flush_mmu_free(tlb);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Masami Hiramatsu: "Re: Re: livepatching tree for linux-next"
Previous message: Thomas Graf: "Re: Fwd: [rhashtable] WARNING: CPU: 0 PID: 10 at kernel/locking/mutex.c:570 mutex_lock_nested()"
In reply to: Linus Torvalds: "Re: Linux 3.19-rc3"
Next in thread: Mark Langsdorf: "Re: Linux 3.19-rc3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]