Re: Locking L1 cache lines in Cyrix 6x86MX CPUs

Mike Jagdis (mike@roan.co.uk)
Tue, 19 May 1998 17:25:06 +0100 (GMT/BST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Ondrej Feela Filip: "Re: Hard lockup with 2.0.34pre15"
Previous message: Alex Belits: "Re: unicode"

On Tue, 19 May 1998, Andr Derrick Balsa wrote:

> Hmmm. _Very_ interesting. I was thinking that perhaps the timer
> interrupt code could be kept in such a locked cache line, because on a
> busy machine it probably gets overwritten between the 10 ms periodic
> interrupts. But that's a hypothesis. Nobody seems to have quantitative
> data on this precise subject.

10ms is _forever_ in modern CPU terms :-). If the system is pretty
much idle the timer code will probably be cached anyway. If the
system isn't idle you need to decide whether you _really_ want to
potentially reduce application performance just so the tick handler
goes fast.

> > My own feeling is that this is not so useful as it might appear
> > at first glance. If you _really_ want to try something interesting
> > why not write a gcc back end that uses a locked L1 line as a nice
> > big register file and see if you can push the x86 architecture to
> > new heights?
>
> That's another very interesting possible application for the 6x86MX L1
> cache locked lines. The x86 instruction set allows most instructions to
> address memory instead of CPU registers, with no additional CPU clock
> cycles.

But you do have to be careful because you lose the benefits of
register renaming and the like so you may _think_ you are doing
well but the pipelines could be foaming like cheap lager on a
hot day...

> Since the L1 cache is dual ported, works at the core clock speed and has
> no more latencies than the usual x86 registers, locking a 1Kb region
> could amount to having 256 general-purpose 32-bit registers.
>
> When one realizes how much gymnastic gcc is forced to do because of the
> scarcity of registers in the x86 architecture, one begins to wonder how
> much of a performance gain one could get with 256 more registers.
>
> Thanks for the tip :) Now who do I contact for more information on a
> possible gcc x86 back end?

I would start by reading the gcc source and studying the existing
back ends for x86 and a register rich one like Alpha. Next year
you might want to try changing a few things...

I had thought that it might be possible just to have gcc use an
explicitly locked region for temporaries. Then it occurred to me
that temporaries will usually be pretty much clustered together
on the same cache line anyway so there may not be that much benefit
- except in the case where they are used either side of a function
call or two in which case a locked scratchpad _might_ help, but then
again the called functions may work better with the extra cache
space...

Head hurting yet? :-)

Mike

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu

Next message: Ondrej Feela Filip: "Re: Hard lockup with 2.0.34pre15"
Previous message: Alex Belits: "Re: unicode"