On 2017-07-28 02:28, Will Deacon wrote:The patch also helps a lot on my platform. (Though it does cause deadlock(related with udelay) in uart driver in early boot, and not sure it's uart driver issue. Just workaround it firstly)
On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
<snip>
I think we should have this discussion now - I brought this up earlier [1]
and I promised a test case that I completely forgot about - but here it
is (attached). Essentially a Big CPU in an acquire-check-release loop
will have an unfair advantage over a little CPU concurrently attempting
to acquire the same lock, in spite of the ticket implementation. If the Big
CPU needs the little CPU to make forward progress : livelock.
<snip>
One solution was to use udelay(1) in such loops instead of cpu_relax(), but
that's not very 'relaxing'. I'm not sure if there's something we could do
within the ticket spin-lock implementation to deal with this.
Does bodging cpu_relax to back-off to wfe after a while help? The event
stream will wake it up if nothing else does. Nasty patch below, but I'd be
interested to know whether or not it helps.
Will
This does seem to help. Here's some data after 5 runs with and without the patch.
time = max time taken to acquire lock
counter = number of times lock acquired
cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
Without the cpu_relax() bodging patch:
=====================================================
cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
==========|==============|===========|==============|
117893us| 2349144| 2us| 6748236|
571260us| 2125651| 2us| 7643264|
19780us| 2392770| 2us| 5987203|
19948us| 2395413| 2us| 5977286|
19822us| 2429619| 2us| 5768252|
19888us| 2444940| 2us| 5675657|
=====================================================
cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
With the cpu_relax() bodging patch:
=====================================================
cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
==========|==============|===========|==============|
3us| 2737438| 2us| 6907147|
2us| 2742478| 2us| 6902241|
132us| 2745636| 2us| 6876485|
3us| 2744554| 2us| 6898048|
3us| 2741391| 2us| 6882901|
==================================================== >
The patch also seems to have helped with fairness in general
allowing more work to be done if the CPU frequencies are more
closely matched (I don't know if this translates to real world
performance - probably not). The counter values are higher
with the patch.
time = max time taken to acquire lock
counter = number of times lock acquired
cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
Without the cpu_relax() bodging patch:
=====================================================
cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
==========|==============|===========|==============|
2us| 5240654| 1us| 5339009|
2us| 5287797| 97us| 5327073|
2us| 5237634| 1us| 5334694|
2us| 5236676| 88us| 5333582|
84us| 5285880| 84us| 5329489|
=====================================================
cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
With the cpu_relax() bodging patch:
=====================================================
cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
==========|==============|===========|==============|
140us| 10449121| 1us| 11154596|
1us| 10757081| 1us| 11479395|
83us| 10237109| 1us| 10902557|
2us| 9871101| 1us| 10514313|
2us| 9758763| 1us| 10391849|
=====================================================
Thanks,
Vikram