Re: increased translation cache footprint in v2.6

From: Benjamin Herrenschmidt
Date: Tue Jun 28 2005 - 02:18:29 EST


On Mon, 2005-06-27 at 11:46 -0400, Dan Malek wrote:
> On Jun 26, 2005, at 3:09 PM, Marcelo Tosatti wrote:
>
> > Thats a very interesting idea, will probably optimize performance in
> > general ("why did nobody thought of it before?" kind).
>
> I've done this before, used the pgd/pmd or pte to hold large page
> size entries. The problem is the amount of code needed in the
> tlbmiss handler to implement this. The Linux page table structure
> doesn't allow us to easily format this information, so we have lots
> of code in the handler to fabricate these entries. It's a significant
> overhead for the normal 4K path that was hard to justify.

How so ? the linux page table structure allow you to format the PTE and
PMD contents pretty much the way you want ...

> We need to be optimizing the applications, since that is where the
> real work is done and where the system spends most of it's time.
> The kernel is easy to optimize with pinned entries, then we have the
> best solution. A minimal overhead for the 4K pages, plus an optimal
> kernel mapping.

Pinned entry are never a good solution, more like a workaround... It's
never good to pin an entry on such a small TLB (though I can understand
that you may want to always pin the kernel first entry) I don't think
it's necessary.

> I do want the solution of variable page sizes in the kernel, because
> we don't have to reserve wired entries, providing the best solution.
> I'm always thinking of this and experiment with it from time to time,
> but
> I haven't found a solution that is satisfactory to me :-) Maybe
> something
> like an early kernel/user test and separate code paths, but I now have
> a solution that eliminates our current test, and I don't want to put it
> back in :-) My holy grail is a 4 instruction tlb miss handler, but I
> haven't
> been able to get the PTEs formatted correctly so everyone is happy.

Paul told me the 8xx has some restrictions about what goes at the "PMD"
level that is a problem for us (is it cache inhibited bit ?) and thus we
cannot completely do the PMD/PTE thingy, but I don't know the details,
can you tell me more ?

For the kernel address space, however, we are pretty much free to do
what we want. The only thing for which the kernel need page tables is
the vmalloc space. The rest can be implemented the way you want by arch
code (though it's often useful to also use page tables for io space).

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/