Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail

From: Qian Cai
Date: Tue Jul 07 2020 - 09:04:46 EST

Next message: Marek Szyprowski: "Re: [PATCH v7 04/36] drm: amdgpu: fix common struct sg_table related issues"
Previous message: Peter Zijlstra: "Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks"
In reply to: Michal Hocko: "Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail"
Next in thread: Michal Hocko: "Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jul 07, 2020 at 02:06:19PM +0200, Michal Hocko wrote:
> On Tue 07-07-20 07:43:48, Qian Cai wrote:
> >
> >
> > > On Jul 7, 2020, at 6:28 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > > Would you have any examples? Because I find this highly unlikely.
> > > OVERCOMMIT_NEVER only works when virtual memory is not largerly
> > > overcommited wrt to real memory demand. And that tends to be more of
> > > an exception rather than a rule. "Modern" userspace (whatever that
> > > means) tends to be really hungry with virtual memory which is only used
> > > very sparsely.
> > >
> > > I would argue that either somebody is running an "OVERCOMMIT_NEVER"
> > > friendly SW and this is a permanent setting or this is not used at all.
> > > At least this is my experience.
> > >
> > > So I strongly suspect that LTP test failure is not something we should
> > > really lose sleep over. It would be nice to find a way to flush existing
> > > batches but I would rather see a real workload that would suffer from
> > > this imprecision.
> >
> > I hear you many times that you really donât care about those use
> > cases unless you hear exactly people are using in your world.
> >
> > For example, when you said LTP oom tests are totally artificial last
> > time and how less you care about if they are failing, and I could only
> > enjoy their efficiencies to find many issues like race conditions
> > and bad error accumulation handling etc that your âreal world use
> > casesâ are going to take ages or no way to flag them.
>
> Yes, they are effective at hitting corner cases and that is fine. I
> am not dismissing their usefulness. I have tried to explain that many
> times but let me try again. Seeing a corner case and think about a
> potential fix is one thing. On the other hand it is not really ideal to
> treat such a failure a hard regression and consider otherwise useful

Well, terms like "corner cases" and "hard regression" are rather
subjective.

> functionality/improvement to be reverted without a proper cost benefit
> analysis. Sure having corner cases is not really nice but really, look
> at this example again. Overcommit setting is a global thing, it is hard
> to change it during runtime nilly willy. Because that might have really
> detrimental side effects on all workloads running. So it is quite
> reasonable to expect that this is either early after the boot or when
> the system is in quiescent state when almost nothing but very core
> services are running and likelihood that the mode of operation changes.

Not really convinced that is only way people will use those tunables.

>
> > There are just too many valid use cases in this wild world. The
> > difference is that I admit that I donât know or even aware all the
> > use cases, and I donât believe you do as well.
>
> Me neither and I am not claiming that. All I am saying is that a real
> risk of a regression is reasonably low that I wouldn't lose sleep over
> that. It is perfectly fine to address this pro-actively if the fix is
> reasonably maintainable. I was mostly reacting to your pushing for a
> revert solely based on LTP results.
>
> LTP is a very useful tool to raise awareness of potential problems but
> you shouldn't really follow those results just blindly.

You must think I am a newbie tester to give me this piece of advice
then.

Next message: Marek Szyprowski: "Re: [PATCH v7 04/36] drm: amdgpu: fix common struct sg_table related issues"
Previous message: Peter Zijlstra: "Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks"
In reply to: Michal Hocko: "Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail"
Next in thread: Michal Hocko: "Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]