[NAK] Re: [PATCH] fs: Optimized fget to improve performance

From: Al Viro
Date: Thu Aug 27 2020 - 10:33:31 EST


On Thu, Aug 27, 2020 at 06:19:44PM +0800, Shaokun Zhang wrote:
> From: Yuqi Jin <jinyuqi@xxxxxxxxxx>
>
> It is well known that the performance of atomic_add is better than that of
> atomic_cmpxchg.
> The initial value of @f_count is 1. While @f_count is increased by 1 in
> __fget_files, it will go through three phases: > 0, < 0, and = 0. When the
> fixed value 0 is used as the condition for terminating the increase of 1,
> only atomic_cmpxchg can be used. When we use < 0 as the condition for
> stopping plus 1, we can use atomic_add to obtain better performance.

Suppose another thread has just removed it from the descriptor table.

> +static inline bool get_file_unless_negative(atomic_long_t *v, long a)
> +{
> + long c = atomic_long_read(v);
> +
> + if (c <= 0)
> + return 0;

Still 1. Now the other thread has gotten to dropping the last reference,
decremented counter to zero and committed to freeing the struct file.

> +
> + return atomic_long_add_return(a, v) - 1;

... and you increment that sucker back to 1. Sure, you return 0, so the
caller does nothing to that struct file. Which includes undoing the
changes to its refecount.

In the meanwhile, the third thread does fget on the same descriptor,
and there we end up bumping the refcount to 2 and succeeding. Which
leaves the caller with reference to already doomed struct file...

IOW, NAK - this is completely broken. The whole point of
atomic_long_add_unless() is that the check and conditional increment
are atomic. Together. That's what your optimization takes out.