Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

From: David Hildenbrand
Date: Tue Apr 23 2024 - 06:19:00 EST


On 23.04.24 12:15, Ryan Roberts wrote:
Hi David,

Sorry for the slow reply on this; its was due to a combination of thinking a bit
more about the options here and being out on holiday.


No worries, there are things more important in life than ptep_get_lockless() :D

(1) seems like the easiest thing to do.

Yes, I'm very much in favour of easy.



Perhaps its useful to enumerate why we dislike the current ptep_get_lockless()?

Well, you sent that patch series with "that aims to reduce the cost and
complexity of ptep_get_lockless() for arm64". (2) and (3) would achieve that. :)

Touche! I'd half forgotten that we were having this conversation in the context
of this series!

I guess your ptep_get_gup_fast() approach is very similar to
ptep_get_lockless_norecency()... So we are back to the beginning :)

Except that it would be limited to GUP-fast :)


But ultimately I've come to the conclusion that it is easy to reason about the
current arm64 ptep_get_lockless() implementation and see that its correct. The
other options both have their drawbacks.

Yes.


Yes, there is a loop in the current implementation that would be nice to get rid
of, but I don't think it is really any worse than the cmpxchg loops we already
have in other helpers.

I'm not planning to persue this any further. Thanks for the useful discussion
(as always).

Make sense to me. let's leave it as is for the time being. (and also see if a GUP-fast user that needs precise dirty/accessed actually gets real)

--
Cheers,

David / dhildenb