Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptivespinning

From: Darren Hart
Date: Tue Apr 06 2010 - 17:22:39 EST

Next message: Tejun Heo: "Re: [tip:perf/urgent] perf kmem: Fix breakage introduced by 5a0e3adslab.h script"
Previous message: David Howells: "Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out"
In reply to: Avi Kivity: "Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptivespinning"
Next in thread: Darren Hart: "Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptivespinning"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Darren Hart wrote:

Avi Kivity wrote:

> At 10%
duty cycle you have 25 waiters behind the lock on average. I don't think this is realistic, and it means that spinning is invoked only rarely.

Perhaps some instrumentation is in order, it seems to get invoked enough to achieve some 20% increase in lock/unlock iterations. Perhaps another metric would be of more value - such as average wait time?

Why measure an unrealistic workload?

No argument there, thus my proposal for an alternate configuration below.

I'd be interested in seeing runs where the average number of waiters is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.
25 average waiters on compute bound code means the application needs to be rewritten, no amount of mutex tweaking will help it.

Perhaps something NR_CPUS threads would be of more interest?

That seems artificial.

How so? Several real world applications use one thread per CPU to dispatch work to, wait for events, etc.

At 10% that's about .8 and at 25% the 2 of your upper limit. I could add a few more duty-cycle points and make 25% the max. I'll kick that off and post the results... probably tomorrow, 10M iterations takes a while, but makes the results relatively stable.

Thanks. But why not vary the number of threads as well?

Absolutely, I don't disagree that all the variables should vary in order to get a complete picture. I'm starting with 8 - it takes several hours to collect the data.

While this might be of less interest after today's discussion, I promised to share the results of a run with 8 threads with a wider selection of lower duty-cycles. The results are very poor for adaptive and worse for aas (multiple spinners) compared to normal FUTEX_LOCK. As Thomas and Peter have pointed out, the implementation is sub-optimal. Before abandoning this approach I will see if I can find the bottlenecks and simplify the kernel side of things. My impression is that I am doing a lot more work in the kernel, especially in the adaptive loop, than is really necessary.

Both the 8 and 256 Thread plots can be viewed here:

http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v4/

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tejun Heo: "Re: [tip:perf/urgent] perf kmem: Fix breakage introduced by 5a0e3adslab.h script"
Previous message: David Howells: "Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out"
In reply to: Avi Kivity: "Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptivespinning"
Next in thread: Darren Hart: "Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptivespinning"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]