Re: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit, amount)

From: Dave Chinner
Date: Wed Oct 04 2023 - 19:10:23 EST


On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote:
> Percpu counter's compare and add are separate functions: without locking
> around them (which would defeat their purpose), it has been possible to
> overflow the intended limit. Imagine all the other CPUs fallocating
> tmpfs huge pages to the limit, in between this CPU's compare and its add.
>
> I have not seen reports of that happening; but tmpfs's recent addition
> of dquot_alloc_block_nodirty() in between the compare and the add makes
> it even more likely, and I'd be uncomfortable to leave it unfixed.
>
> Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it.
>
> I believe this implementation is correct, and slightly more efficient
> than the combination of compare and add (taking the lock once rather
> than twice when nearing full - the last 128MiB of a tmpfs volume on a
> machine with 128 CPUs and 4KiB pages); but it does beg for a better
> design - when nearing full, there is no new batching, but the costly
> percpu counter sum across CPUs still has to be done, while locked.
>
> Follow __percpu_counter_sum()'s example, including cpu_dying_mask as
> well as cpu_online_mask: but shouldn't __percpu_counter_compare() and
> __percpu_counter_limited_add() then be adding a num_dying_cpus() to
> num_online_cpus(), when they calculate the maximum which could be held
> across CPUs? But the times when it matters would be vanishingly rare.
>
> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Tim Chen <tim.c.chen@xxxxxxxxx>
> Cc: Dave Chinner <dchinner@xxxxxxxxxx>
> Cc: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---
> Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7,
> which are just internal to shmem, and do not affect this patch (which
> applies to v6.6-rc and linux-next as is): but want to run this by you.

Hmmmm. IIUC, this only works for addition that approaches the limit
from below?

So if we are approaching the limit from above (i.e. add of a
negative amount, limit is zero) then this code doesn't work the same
as the open-coded compare+add operation would?

Hence I think this looks like a "add if result is less than"
operation, which is distinct from then "add if result is greater
than" operation that we use this same pattern for in XFS and ext4.
Perhaps a better name is in order?

I'm also not a great fan of having two
similar-but-not-quite-the-same implementations for the two
comparisons, but unless we decide to convert the XFs slow path to
this it doesn't matter that much at the moment....

Implementation seems OK at a quick glance, though.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx