Re: increased translation cache footprint in v2.6

From: David S. Miller
Date: Sun Jun 26 2005 - 20:00:07 EST


From: Marcelo Tosatti <marcelo.tosatti@xxxxxxxxxxxx>
Date: Sun, 26 Jun 2005 16:09:44 -0300

> Thats a very interesting idea, will probably optimize performance in
> general ("why did nobody thought of it before?" kind).
>
> The increase in TLB miss handler size might be offset by the reduced
> kernel misses...
>
> Dan, what do you think?

I doubt it, it cost a single comparison on sparc64 to implement this.

Basically, the TLB miss handler for data accesses on sparc64 looks
like the following.

Load the miss information:

ldxa [%g1 + %g1] ASI_DMMU, %g4 ! Get TAG_ACCESS

If "TAG_CONTEXT_BITS" is zero, it's for the kernel:

andcc %g4, TAG_CONTEXT_BITS, %g0 ! From Nucleus?
be,pn %xcc, 3f ! Yep, special processing
...

If the virtual address has the top-most bit set (and thus it's
"negative"), then it's in the physical memory direct mapping
area. The trap handler register %g2 is preloaded with a fixed
value, that when XOR'd with the fault address information
produces a suitable PTE for loading right into the TLB. This
PTE uses 4MB pages.

3: brlz,pt %g4, 9b ! Kernel virtual map?
xor %g2, %g4, %g5 ! Finish bit twiddles

...

Store the calculated TLB entry and return from the trap.

9: stxa %g5, [%g0] ASI_DTLB_DATA_IN ! Reload TLB
retry ! Trap return

So that's 7 instructions, 2 instruction cache lines, with no main
memory accesses. Surely the PPC folks can do something similar. :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/