Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail

From: Dennis Zhou
Date: Mon Jul 06 2020 - 21:06:58 EST


On Mon, Jul 06, 2020 at 09:24:43PM +0800, Feng Tang wrote:
> Hi All,
>
> Please help to review this fix patch, thanks!
>
> It is against today's linux-mm tree. For easy review, I put the fix
> into one patch, and I could split it to 2 parts for percpu-counter
> and mm/util.c if it's preferred.
>
> From 593f9dc139181a7c3bb1705aacd1f625f400e458 Mon Sep 17 00:00:00 2001
> From: Feng Tang <feng.tang@xxxxxxxxx>
> Date: Mon, 6 Jul 2020 14:48:29 +0800
> Subject: [PATCH] mm/util.c: sync vm_committed_as when changing memory policy
> to OVERCOMMIT_NEVER
>
> With the patch to improve scalability of vm_committed_as [1], 0day reported
> the ltp overcommit_memory test case could fail (fail rate is about 5/50) [2].
> The root cause is when system is running with loose memory overcommit policy
> like OVERCOMMIT_GUESS/ALWAYS, the deviation of vm_committed_as could be big,
> and once the policy is runtime changed to OVERCOMMIT_NEVER, vm_committed_as's
> batch is decreased to 1/64 of original one, but the deviation is not
> compensated accordingly, and following __vm_enough_memory() check for vm
> overcommit could be wrong due to this deviation, which breaks the ltp
> overcommit_memory case.
>
> Fix it by forcing a sync for percpu counter vm_committed_as when overcommit
> policy is changed to OVERCOMMIT_NEVER (sysctl -w vm.overcommit_memory=2).
> The sync itself is not a fast operation, and is toleratable given user is
> not expected to frequently changing policy to OVERCOMMIT_NEVER.
>
> [1] https://lore.kernel.org/lkml/1592725000-73486-1-git-send-email-feng.tang@xxxxxxxxx/
> [2] https://marc.info/?l=linux-mm&m=159367156428286 (can't find a link in lore.kernel.org)
>
> Reported-by: kernel test robot <rong.a.chen@xxxxxxxxx>
> Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
> ---
> include/linux/percpu_counter.h | 4 ++++
> lib/percpu_counter.c | 14 ++++++++++++++
> mm/util.c | 11 ++++++++++-
> 3 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> index 0a4f54d..01861ee 100644
> --- a/include/linux/percpu_counter.h
> +++ b/include/linux/percpu_counter.h
> @@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
> s32 batch);
> s64 __percpu_counter_sum(struct percpu_counter *fbc);
> int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
> +void percpu_counter_sync(struct percpu_counter *fbc);
>
> static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
> {
> @@ -172,6 +173,9 @@ static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
> return true;
> }
>
> +static inline void percpu_counter_sync(struct percpu_counter *fbc)
> +{
> +}
> #endif /* CONFIG_SMP */
>
> static inline void percpu_counter_inc(struct percpu_counter *fbc)
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index a66595b..02d87fc 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -98,6 +98,20 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
> }
> EXPORT_SYMBOL(percpu_counter_add_batch);
>
> +void percpu_counter_sync(struct percpu_counter *fbc)
> +{
> + unsigned long flags;
> + s64 count;
> +
> + raw_spin_lock_irqsave(&fbc->lock, flags);
> + count = __this_cpu_read(*fbc->counters);
> + fbc->count += count;
> + __this_cpu_sub(*fbc->counters, count);
> + raw_spin_unlock_irqrestore(&fbc->lock, flags);
> +}
> +EXPORT_SYMBOL(percpu_counter_sync);
> +
> +
> /*
> * Add up all the per-cpu counts, return the result. This is a more accurate
> * but much slower version of percpu_counter_read_positive()
> diff --git a/mm/util.c b/mm/util.c
> index 52ed9c1..5fb62c0 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -746,14 +746,23 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
> return ret;
> }
>
> +static void sync_overcommit_as(struct work_struct *dummy)
> +{
> + percpu_counter_sync(&vm_committed_as);
> +}
> +

This seems like a rather niche use case as it's currently coupled with a
schedule_on_each_cpu(). I can't imagine a use case where you'd want to
do this without being called by schedule_on_each_cpu().

Would it be better to modify or introduce something akin to
percpu_counter_sum() which sums and folds in the counter state? I'd be
curious to see what the cost of always folding would be as this is
already considered the cold path and would help with the next batch too.

> int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
> size_t *lenp, loff_t *ppos)
> {
> int ret;
>
> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> - if (ret == 0 && write)
> + if (ret == 0 && write) {
> + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> + schedule_on_each_cpu(sync_overcommit_as);
> +
> mm_compute_batch();
> + }
>
> return ret;
> }
> --
> 2.7.4
>
>
> On Sun, Jul 05, 2020 at 10:36:14PM -0400, Qian Cai wrote:
> > > In my last email, I was not saying OVERCOMMIT_NEVER is not a normal case,
> > > but I don't think user will too frequently runtime change the overcommit
> > > policy. And the fix patch of syncing 'vm_committed_as' is only called when
> > > user calls 'sysctl -w vm.overcommit_memory=2'.
> > >
> > > > The question is now if any of those regression fixes would now regress
> > > > performance of OVERCOMMIT_NEVER workloads or just in-par with the data
> > > > before the patchset?
> > >
> > > For the original patchset, it keeps vm_committed_as unchanged for
> > > OVERCOMMIT_NEVER policy and enlarge it for the other 2 loose policies
> > > OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS, and I don't expect the "OVERCOMMIT_NEVER
> > > workloads" performance will be impacted. If you have suggetions for this
> > > kind of benchmarks, I can test them to better verify the patchset, thanks!
> >
> > Then, please capture those information into a proper commit log when you
> > submit the regression fix on top of the patchset, and CC PER-CPU MEMORY
> > ALLOCATOR maintainers, so they might be able to review it properly.
>
>
>

Thanks,
Dennis