Re: Quad core CPUs loaded at only 50% when running a CPU and mmapintensive multi-threaded task

From: edwin
Date: Mon Aug 25 2008 - 07:30:42 EST


edwin wrote:
Peter Zijlstra wrote:
On Mon, 2008-08-25 at 13:22 +0300, TÃrÃk Edwin wrote:

Well, the real program (clamd) that this testprogram tries to simulate does an mmap for almost every file, and I have lots of small files.
6.5G, 114122 files, average size 57k.

I'll run latencytop again, last time it has showed 100ms - 500ms latency

Latencytop output attached.
There is 4 - 60 ms latency for mmap/munmap, and the more threads there are the total latency gets higher (latencytop says sum was ~480ms).

Running with MaxThreads 4 gets me 300-400% CPU usage, but with MaxThreads 8 CPU usage drops to around 120-250%.
Now, maxthreads 4 looks like a good choice from a CPU usage point of view, but is actually bad because it means that threads gets stuck in iowait, and the CPU won't have anything to do. MaxThreads 8 looked like a good alternative to fill the iowait gaps, but we run into the mmap_sem issue.
In a real world environment MaxThreads influences how many mails you can process in parallel with your MTA, so generally it should be as high as possible.

On 2.6.27-rc4:

MaxThreads 4 time, empty database (all cached, almost no I/O):
1m9s

MaxThreads 4 time, after echo 3>/proc/sys/vm/drop_caches:
1m29s

MaxThreads 8 time, empty database (all cached, almost no I/O):
2m16s

MaxThreads 8 time, after echo 3>/proc/sys/vm/drop_caches:
2m15s


MaxThreads 8, full DB (13 % slower than 2.6.24)
4m42s

MaxThreads 4, full DB (8% faster than 2.6.24)
2m35s

MaxThreads 8, full DB, 2.6.24:
4m3s

MaxThreads 4, full DB, 2.6.24:
2m50s

I have run an echo 3>/proc/sys/vm/drop_caches before each, I hope that clears all caches,
I have xfs on top of lvm, on top of raid10, and iostat shows only 0 - 20% activity (%util).
That could also mean of course that the disks can provide data fast enough for clamd.


Of course running with a full database will give different results, so I'll do some timing with that too (will take a little longer though).

for clamd, and it was about mmap, I'll provide you with the exact output.

Right - does it make sense to teach clamav about pread() ?

If it is preferred over mmap, then maybe yes.

Peter Zijlstra wrote:
Best regards,
--Edwin
OK, I'll poke a little more at is later today to see if I can spot
something

Thanks!


Still, if I have more threads, performance *decreases* almost linearly with 2.6.27 (and probably with 2.6.25+ if clamd behaves the same as my test proggie), however with 2.6.24

With Debian etch having 2.6.24 (etchnhalf actually), and lenny shipping with 2.6.25 or 2.6.26, users upgrading from etch to lenny could see a performance decrease.

Best regards,
--Edwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/