AVC Seqlock vs. RCU experiment results

From: Kylene J Hall
Date: Mon Oct 11 2004 - 10:28:26 EST

Next message: Rafael J. Wysocki: "Re: 2.6.9-rc4-mm1"
Previous message: David Woodhouse: "Re: Fw: signed kernel modules?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

The following is our the results from our experiment. Since after many
iterations based on comments the patch is still not showing significant
performance improvements we will turn our efforts towards looking at the
rcu patch.

In the results section
wselinux corresponds to booting with the parameter selinux=1
woselinux corresponds to booting with the parameter selinux=0

1way Results
Chatroom results (msg/sec) 4 iterations each default configuration
myseq+wselinux rcu+wselinux Vanilla+wselinux
491400 429184 194931
300075 449943 428724
466200 433839 324675
440528 448933 430107
================================================================

Unixbench results (index values)
myseq+wselinux
rcu+wselinux Vanilla+wselinux
Dhrystone 2 using register variables 307.2 309.5
307.2
Double-Precision Whetstone 145.6 146.3
146.3
Execl Throughput 621.4 651.4
643.5
File Copy 1024 bufsize 2000 maxblocks 640.6 533.9 650.4
File Copy 256 bufsize 500 maxblocks 614.3 431.2
669.0
File Copy 4096 bufsize 8000 maxblocks 528.2 498.5 513.6
Pipe Throughput 524.1 410.5
568.7
Process Creation 991.1 956.5
962.3
Shell Scripts (8 concurrent) 485.5 501.7
495.0
System Call Overhead 713.9 725.9
721.8
================================================================
503.2 466.1 513.1

4way Results
Chatroom results (msg/sec)
myseq+woselinux myseq+wselinux rcu+woselinux rcu+wselinux
Vanilla+woselinux Vanilla+wselinux
466744 275103 378787 310318 455062 229885
455062 318471 454029 389863 423280 209205
437636 314712 431965 326264 372786 201005
431499 374181 418410 309837 377714 191204
========================================================================

Unixbench results (index values)
myseq+woselinux
myseq+wselinux rcu+woselinux rcu+wselinux Vanilla+woselinux
Vanilla+wselinux
Dhrystone 2 using register variables 131.5 131.5 131.4 131.5
131.1 131.3
Double-Precision Whetstone 82.7 82.7 83.1 83.0 82.7
83.1
Execl Throughput 311.8 285.8 309.0 297.3
309.5 296.0
File Copy 1024 bufsize 2000 maxblocks 172.7 154.3
174.1 159.1 173.0 155.8
File Copy 256 bufsize 500 maxblocks 159.2 135.2 159.8 141.7
161.0 138.0
File Copy 4096 bufsize 8000 maxblocks 209.4 195.5
209.4 201.5 209.4 200.5
Pipe Throughput 240.0 162.8 238.0 195.5
240.3 187.3
Process Creation 343.4 332.4 339.8 340.8
342.4 337.5
Shell Scripts (8 concurrent) 791.7 767.2 793.3 769.5
792.2 765.0
System Call Overhead 375.9 377.6 374.9 373.4
375.7 377.3
================================================================
233.4 213.7 233.1
221.4 233.4 219.0

>> 1) the number of AVC entries is 3 times as big, increasing
>> the chance of L1/L2 cache misses, but decreasing the chance
>> of an AVC miss - dunno if this is good or bad

>We increased this number to attempt to do just that produce fewer AVC
misses. We saw about a 1 point improvement on the Unixbench benchmark

>> 2) avc_claim_node doesn't check any more whether the avc
>> entry got recently used - dunno if this is good or bad

>My initial attempts and trying to keep this functionality seemed racy but
I think I could try:

> int claim_this_node = -1;
> take read_lock
> while( walking list) {
> if ( current_node->used == UNUSED )
> claim_this_node = current_node
> break;
> current_node->used = UNUSED
> }
> if ( claim_this_node == -1 )
> claim_this_node = round robin approach
>
> . . .continue with current claim node work
>
Have now tried this results included above no real noticeable improvement.

>> 3) the av_decision isn't part of the avc_entry any more
>> (again, dunno if this is good or bad)

>Here was our logic correct me if I'm wrong. We assumed the cacheline
size to be 128 bytes since we wanted each lock to protect a cacheline or a
>couple of cachelines which were a group of entries which had the same
hash value we needed to decrease the size of the entries inorder to fit
more >into this region thus we used a pointer to the av_decision structure
which accounts for most of the data stored per entry. This allowed us to
pack the >lookup data which accessed most frequently in the adjacent
cachelines. Make sense? Other suggestions?

>>cheers,

>>Rik

Kylene J. Hall
IBM Linux Technology Center

Attachment: seq.patch
Description: Binary data

Next message: Rafael J. Wysocki: "Re: 2.6.9-rc4-mm1"
Previous message: David Woodhouse: "Re: Fw: signed kernel modules?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]