Re: overzealous TLB flushing by lazy VMAP flushing

From: David Miller
Date: Mon Aug 04 2014 - 19:35:47 EST


From: David Miller <davem@xxxxxxxxxxxxx>
Date: Mon, 04 Aug 2014 16:23:14 -0700 (PDT)

Sorry, I screwed up the lkml CC:, fixing that here.

> Hey Nick,
>
> The lazy VMAP flushing in mm/vmalloc.c seems to make various
> assumptions about vmalloc area layout.
>
> In particular it assumes that if there are pending VMAP flushes
> in multiple regions managed by vmap/vunmap, it's safe to queue
> up a range flush from the lowest such address to the highest
> such address.
>
> This is problematic and causes problems on sparc64 as diagnosed by
> Christopher (CC:'d).
>
> On sparc64 we have the following regions:
>
> modules 0x010000000 --> 0x0f0000000
> openfirmware 0x0f0000000 --> 0x100000000
> vmalloc 0x100000000 --> 0x10000000000
>
> So if a module is unloaded as well as some vfree()'s occur, the next
> lazy VMAP flush will flush a range that covers all of openfirmware.
>
> This will flush the firmware's locked TLB entries, which in turn cause
> all sorts of problems.
>
> It is not possible to adjust where these ranges are in order to make
> the vmalloc and module ranges be right next to eachother. The
> firmware area is fixed, first of all. Second of all the module area
> has to be in the low 4GB because of the code model we compile the
> kernel with (all symbols are 32-bit), and we want to use as little of
> the sub-4GB area as possible because it has to fit the main kernel
> image, modules, and the firmware region.
>
> We could add all sorts of range logic to the flush_tlb_range()
> implementation on sparc64, but I really think that the kernel should
> not trigger a TLB flush across a range for which it never managed any
> mappings.
>
> I also think that the lazy VMAP flusher should be mindful of this for
> another reason. Specifically, issuing such an enormous flush range is
> going to be expensive, more expensive that whatever we were gaining by
> batching these flushes.
>
> Unlike for userspace mappings, for kernel mappings we can't have a
> cutoff for page-by-page flushes and just do a context based TLB flush.
> We always have to do page-by-page flushes. So these huge ranges
> really do hurt.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/