On 2021/5/28 9:31, Jason Wang wrote:
在 2021/5/27 下午5:03, Yunsheng Lin 写道:this commit log.
On 2021/5/27 16:05, Jason Wang wrote:
在 2021/5/27 下午3:21, Yunsheng Lin 写道:But the ring might be non-empty but __ptr_ring_empty() returns true
On 2021/5/27 14:53, Jason Wang wrote:Just to make sure we are at the same page. What I really meant is "synchronized" not "serialized". So they can be called at the same time but need synchronization.
在 2021/5/27 下午2:07, Yunsheng Lin 写道:I am saying if __ptr_ring_empty() and __ptr_ring_discard_one() is
On 2021/5/27 12:57, Jason Wang wrote:I'm not sure I understand. But my point is that you need to synchronize the __ptr_ring_discard_one() and __ptr_empty() as explained in the comment above __ptr_ring_empty():
在 2021/5/26 下午8:29, Yunsheng Lin 写道:Yes.
Currently r->queue[] is cleared after r->consumer_head is movedIf I understand this correctly, this can only happens if you run __ptr_ring_empty() in parallel with ptr_ring_discard_one().
forward, which makes the __ptr_ring_empty() checking called in
page_pool_refill_alloc_cache() unreliable if the checking is done
after the r->queue clearing and before the consumer_head moving
forward.
Move the r->queue[] clearing after consumer_head moving forward
to make __ptr_ring_empty() checking more reliable.
I think those two needs to be serialized. Or did I miss anything?As the below comment in __ptr_ring_discard_one, if the above is true, I
do not think we need to keep consumer_head valid at all times, right?
/* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
* to work correctly.
*/
always serialized, then it seems that the below commit is unnecessary?
406de7555424 ("ptr_ring: keep consumer_head valid at all times")This still needed in this case.
It means "the ring might be empty but __ptr_ring_empty() returns false"./*I am not sure I understand "incorrectly detecting the ring as empty"
* Test ring empty status without taking any locks.
*
* NB: This is only safe to call if ring is never resized.
*
* However, if some other CPU consumes ring entries at the same time, the value
* returned is not guaranteed to be correct.
*
* In this case - to avoid incorrectly detecting the ring
* as empty - the CPU consuming the ring entries is responsible
* for either consuming all ring entries until the ring is empty,
* or synchronizing with some other CPU and causing it to
* re-test __ptr_ring_empty and/or consume the ring enteries
* after the synchronization point.
means, is it because of the data race described in the commit log?
for the data race described in the commit log:)
Which commit log?
If the data race described in this commit log happens, the ring might be
non-empty, but __ptr_ring_empty() returns true.
As a matter of fact, I am not treating it as a performance optimization for this patch.
Yes, I am agreed there may not be any difference.Or other data race? I can not think of other data race if consumingIs __ptr_ring_empty() synchronized with the consumer in your case? If yes, have you done some benchmark to see the difference?
and __ptr_ring_empty() is serialized:)
I am agreed that __ptr_ring_empty() checking is not totally reliable
without taking r->consumer_lock, that is why I use "more reliable"
in the title:)
Have a look at page pool, this only helps when multiple refill request happens in parallel which can make some of the refill return early if the ring has been consumed.
This is the slow-path and I'm not sure we see any difference. If one the request runs faster then the following request will go through the fast path.
But it is better to make it more reliable, right?
No, any performance optimization must be benchmark to show obvious difference to be accepted.
ptr_ring has been used by various subsystems so we should not risk our self-eves to accept theoretical optimizations.
I treated it as improvement for the checking of __ptr_ring_empty().
But you are right that we need to ensure there is not performance regression when improving
it.
Any existing and easy-to-setup testcase to benchmark the ptr_ring performance?
If it really helps, can we do it more simpler by: