tracing: horrible read performance on host with many CPUs

From: Dmitry Monakhov
Date: Wed Aug 27 2014 - 04:50:50 EST



I have tried to use tracing on host with 32cpus, but it is appeared
that performance is horrible.
dd if=/sys/kernel/debug/tracing/trace_pipe of=tmpfs/t3.log bs=1M
0+21268 records in
0+21267 records out
85701248 bytes (86 MB) copied, 26.1424 s, 3.3 MB/s
0+25706 records in
0+25705 records out
103600749 bytes (104 MB) copied, 31.6595 s, 3.3 MB/s
0+59204 records in
0+59203 records out
238746128 bytes (239 MB) copied, 73.4347 s, 3.3 MB/s
Since I've collected ~3Gb of data this takes a lot of time to
simply copy from kernel to tmpfs.

AFAIU this happen due to sub-optimal sorting procedure __find_next_entry
Each time it walks each cpu and pick the one with smallest timestamp.
This can be optimized simply by fetching N-entries at the time. Are
there any plans to implement that?

BTW:What is the most convenient way fetch big data from traces?
One of possible way is to dump per-cpu traces(20Mb/s in my case) and
then merge files according to timestamp


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/