Re: [mm/gup] 47e29d32af: phoronix-test-suite.npb.FT.A.total_mop_s -45.0% regression

From: Oliver Sang
Date: Thu Nov 19 2020 - 21:28:29 EST


On Wed, Nov 18, 2020 at 10:17:27AM -0800, Dan Williams wrote:
> On Wed, Nov 18, 2020 at 5:51 AM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Mon 16-11-20 19:35:31, John Hubbard wrote:
> > >
> > > On 11/16/20 6:48 PM, kernel test robot wrote:
> > > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a -45.0% regression of phoronix-test-suite.npb.FT.A.total_mop_s due to commit:
> > > >
> > >
> > > That's a huge slowdown...
> > >
> > > >
> > > > commit: 47e29d32afba11b13efb51f03154a8cf22fb4360 ("mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > ...but that commit happened in April, 2020. Surely if this were a serious
> > > issue we would have some other indication...is this worth following up
> > > on?? I'm inclined to ignore it, honestly.
> >
> > Why this was detected so late is a fair question although it doesn't quite
> > invalidate the report...
>
> I don't know what specifically happened in this case, perhaps someone
> from the lkp team can comment?

- some extra phoronix test suites are enabled/fixed gradually so we will have
better coverage
- we scan kernel releases within the year to baseline the performance, it may
trigger bisection if one release has regressed and not recovered.

With this continuous effort, 0-day ci can detect the changes on mainline.

> However, the myth / contention that
> "surely someone else would have noticed by now" is why the lkp project
> was launched. Kernels regressed without much complaint and it wasn't
> until much later in the process, around the time enterprise distros
> rebased to new kernels, did end users start filing performance loss
> regression reports. Given -stable kernel releases, 6-7 months is still
> faster than many end user upgrade cycles to new kernel baselines.