* Paulo Marques <pmarques@xxxxxxxxxxxx> wrote:
yeah. Maybe someone will find the time to improve the algorithm. But
it's not a highprio thing.
Well, I found some time and decided to give it a go :)
great, your patch is looking really good!
The original algorithm took, on average, 1340us per lookup on my P4
2.8GHz. The compile settings for the test are not the same on the
kernel, so this can be only compared against other results from the
same setup.
ouch. I consider fixing this a quality of implementation issue.
With the attached patch it takes 14us per lookup. This is almost a
100x improvement.
wow! I have tried your patch and /proc/latency_trace now produces
instantaneous output.
...
your patch doesnt add all that much of code. It adds 288 bytes to .text
and 64 bytes to .data. A typical .config generates 180K of compressed
kallsyms data (with !KALLSYMS_ALL), so your patch increases the kallsyms
overhead by a mere 0.2%. So it's really not an issue - especially since
kallsyms can be disabled in .config.
...
the standard way is to add the extra initializers. The gcc folks feel
that those rare cases where gcc gets it wrong justify the benefit of
catching lots of real bugs. I've added the extra initialization to -O8.