On Thursday 20 June 2002 01:54, Stephen C. Tweedie wrote:
> > I'm checking out a proper hash function at the moment.
>
> Done, checked into ext3 cvs (features-branch again.)
>
> Deleting and recreating 100,000 files with this kernel:
>
> [root@spock test0]# time xargs rm -f < /root/flist.100000
>
> real 0m14.305s
> user 0m0.750s
> sys 0m5.430s
> [root@spock test0]# time xargs touch < /root/flist.100000
>
> real 0m16.244s
> user 0m0.530s
> sys 0m6.660s
>
> that's an average of 160usec per create, 140usec per delete elapsed
> time, and 66/54usec respectively system time.
>
> I assume the elapsed time is greater only because we're starting to
> wrap the journal due to the large amount of metadata being touched
> (we're touching a lot of inodes doing the above, which I could avoid
> by making hard links instead of new files.) Certainly, limiting the
> test to 10,000 files lets it run at 100% cpu.
I ran a bakeoff between your new half-md4 and dx_hack_hash on Ext2. As
predicted, half-md4 does produce very even bucket distributions. For 200,000
creates:
half-md4: 2872 avg bytes filled per 4k block (70%)
dx_hack_hash: 2853 avg bytes filled per 4k block (69%)
but guess which was faster overall?
half-md4: user 0.43 system 6.88 real 0:07.33 CPU 99%
dx_hack_hash: user 0.43 system 6.40 real 0:06.82 CPU 100%
This is quite reproducible: dx_hack_hash is always faster by about 6%. This
must be due entirely to the difference in hashing cost, since half-md4
produces measurably better distributions. Now what do we do?
By the way, I'm running about 37 usec per create here, on a 1GHz/1GB PIII,
with Ext2. I think most of the difference vs your timings is that your test
code is eating a lot of cpu.
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Jun 23 2002 - 22:00:24 EST