On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote:
Sebastian noted that overhead for worker thread ops (throughput)Acked-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
accounting was producing 'perf' to appear in the profiles, consuming
a non-trivial (ie 13%) amount of CPU. This is due to cacheline
bouncing due to the increment of w->ops. We can easily fix this by
just working on a local copy and updating the actual worker once
done running, and ready to show the program summary. There is no
danger of the worker being concurrent, so we can trust that no stale
value is being seen by another thread.
Reported-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
static void *workerfn(void *arg)
{
int ret;
- unsigned int i;
struct worker *w = (struct worker *) arg;
+ unsigned int i;
+ unsigned long ops = w->ops; /* avoid cacheline bouncing */
we start at 0 so there is probably no need to init it with w->ops.