Re: [PATCH v2 0/6] Rework REFCOUNT_FULL using atomic_fetch_* operations

From: Hanjun Guo
Date: Fri Sep 06 2019 - 21:57:19 EST


On 2019/9/6 21:43, Will Deacon wrote:
> On Wed, Aug 28, 2019 at 02:03:37PM -0700, Kees Cook wrote:
>> On Wed, Aug 28, 2019 at 03:14:40PM +0100, Will Deacon wrote:
>>> On Wed, Aug 28, 2019 at 09:30:52AM +0200, Peter Zijlstra wrote:
>>>> On Tue, Aug 27, 2019 at 05:31:58PM +0100, Will Deacon wrote:
>>>>> Will Deacon (6):
>>>>> lib/refcount: Define constants for saturation and max refcount values
>>>>> lib/refcount: Ensure integer operands are treated as signed
>>>>> lib/refcount: Remove unused refcount_*_checked() variants
>>>>> lib/refcount: Move bulk of REFCOUNT_FULL implementation into header
>>>>> lib/refcount: Improve performance of generic REFCOUNT_FULL code
>>>>> lib/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
>> BTW, can you repeat the timing details into the "Improve performance of
>> generic REFCOUNT_FULL code" patch?
> Of course.
>
>>>> So I'm not a fan; I itch at the whole racy nature of this thing and I
>>>> find the code less than obvious. Yet, I have to agree it is exceedingly
>>>> unlikely the race will ever actually happen, I just don't want to be the
>>>> one having to debug it.
>>> FWIW, I think much the same about the version under arch/x86 ;)
>>>
>>>> I've not looked at the implementation much; does it do all the same
>>>> checks the FULL one does? The x86-asm one misses a few iirc, so if this
>>>> is similarly fast but has all the checks, it is in fact better.
>>> Yes, it passes all of the REFCOUNT_* tests in lkdtm [1] so I agree that
>>> it's an improvement over the asm version.
>>>
>>>> Can't we make this a default !FULL implementation?
>>> My concern with doing that is I think it would make the FULL implementation
>>> entirely pointless. I can't see anybody using it, and it would only exist
>>> as an academic exercise in handling the theoretical races. That's a change
>>> from the current situation where it genuinely handles cases which the
>>> x86-specific code does not and, judging by the Kconfig text, that's the
>>> only reason for its existence.
>> Looking at timing details, the new implementation is close enough to the
>> x86 asm version that I would be fine to drop the x86-specific case
>> entirely as long as we could drop "FULL" entirely too -- we'd have _one_
>> refcount_t implementation: it would be both complete and fast.
> That works for me; I'll spin a new version of this series so you can see
> what it looks like.

I will wait for the new version then do the performance test on ARM64 server.

Thanks
Hanjun