Re: Optimized memset, anybody willing to test?

David Mosberger-Tang (davidm@azstarnet.com)
Tue, 23 Apr 1996 19:32:48 -0700


>>>>> On Tue, 23 Apr 1996 21:01:24 +0300 (EET DST), Linus Torvalds <torvalds@cs.helsinki.fi> said:

Linus> This was written with the idea that "small is beautiful", so
Linus> I've tried to make it as small as possible, while at the same
Linus> time trying to do a good job of instruction scheduling (in
Linus> real life the performance is almost totally memory-bound, but
Linus> I'm anal). It's EV5-optimized as far as I can tell, as doing
Linus> the optimization for EV4 is boring..

I still think it would be awesome to have a sample ev5 scheduler like
the one that was provided for the ev4 (see example below). The
pipeline diagrams are extremely handy for manually optimizing assembly
code. And to be honest, the ev4 scheduling rules already were beyond
me to get them right for anything but trivial code, so I don't even
think about calculating this stuff by hand.

Somebody else asked for such a sample scheduler before and I haven't
seen any responses, but I haven't quite given up on it just yet.

--david

$ ev4sched foo.o
Basic block before code scheduling
|.I | 1 00000004 1 LDD F8,0(R0)
| I | 2 00000008 2 LDD F1,8(R16)
| .I | 3 0000000C 3 LDD F9,8(R0)
| I | 4 00000010 4 MULG F0,F5,F0
| ......I | 10 00000014 5 ADDG F8,F0,F8
| I | 11 00000018 6 MULG F1,F5,F1
| ......I | 17 0000001C 7 ADDG F9,F1,F9
| I | 18 00000020 8 LDA R16,10(R16)
| .I| 19 00000024 9 CMPLT R16,R10,R11
|I | 20 00000028 10 LDA R0,10(R0)
|..I | 22 0000002C 11 STD F8,0(R0)
| I | 23 00000030 12 STD F9,8(R0)
| I | 23 00000034 13 BNE R11,0

Basic block after code scheduling
|01234567890123456789| Cycle Offset Number Instruction
|I | 0 00000000 0 LDD F0,0(R16)
|.I | 1 00000004 2 LDD F1,8(R16)
| I | 2 00000008 1 LDD F8,0(R0)
| I | 2 0000000C 8 LDA R16,10(R16)
| I | 3 00000010 4 MULG F0,F5,F0
| I | 3 00000014 3 LDD F9,8(R0)
| I | 4 00000018 6 MULG F1,F5,F1
| I | 4 0000001C 10 LDA R0,10(R0)
| I | 5 00000020 9 CMPLT R16,R10,R11
| ....I | 9 00000024 5 ADDG F8,F0,F8
| I | 10 00000028 7 ADDG F9,F1,F9
| ...I | 13 0000002C 11 STD F8,0(R0)
| I | 14 00000030 12 STD F9,8(R0)
| I | 14 00000034 13 BNE R11,0