Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression

From: Linus Torvalds
Date: Wed Aug 11 2021 - 02:00:41 EST

Next message: Christoph Hellwig: "Re: [PATCH v3 2/5] dma-iommu: fix arch_sync_dma for map"
Previous message: Christoph Hellwig: "Re: [PATCH v3 1/5] dma-iommu: fix sync_sg with swiotlb"
In reply to: kernel test robot: "[mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression"
Next in thread: Johannes Weiner: "Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Aug 10, 2021 at 4:59 PM kernel test robot <oliver.sang@xxxxxxxxx> wrote:
>
> FYI, we noticed a -36.4% regression of vm-scalability.throughput due to commit:
> 2d146aa3aa84 ("mm: memcontrol: switch to rstat")

Hmm. I guess some cost is to be expected, but that's a big regression.

I'm not sure what the code ends up doing, and how relevant this test
is, but Johannes - could you please take a look?

I can't make heads nor tails of the profile. The profile kind of points at this:

> 2.77 ą 12% +27.4 30.19 ą 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 2.86 ą 12% +27.4 30.29 ą 8% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 2.77 ą 12% +27.4 30.21 ą 8% perf-profile.children.cycles-pp.lock_page_lruvec_irqsave
> 4.26 ą 10% +28.1 32.32 ą 7% perf-profile.children.cycles-pp.lru_cache_add
> 4.15 ą 10% +28.2 32.33 ą 7% perf-profile.children.cycles-pp.__pagevec_lru_add

and that seems to be from the chain __do_fault -> shmem_fault ->
shmem_getpage_gfp -> lru_cache_add -> __pagevec_lru_add ->
lock_page_lruvec_irqsave -> _raw_spin_lock_irqsave ->
native_queued_spin_lock_slowpath.

That shmem_fault codepath being hot may make sense for sokme VM
scalability test. But it seems to make little sense when I look at the
commit that it bisected to.

We had another report of this commit causing a much more reasonable
small slowdown (-2.8%) back in May.

I'm not sure what's up with this new report. Johannes, does this make
sense to you?

Is it perhaps another "unlucky cache line placement" thing? Or has the
statistics changes just changed the behavior of the test?

Anybody?

Linus

Next message: Christoph Hellwig: "Re: [PATCH v3 2/5] dma-iommu: fix arch_sync_dma for map"
Previous message: Christoph Hellwig: "Re: [PATCH v3 1/5] dma-iommu: fix sync_sg with swiotlb"
In reply to: kernel test robot: "[mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression"
Next in thread: Johannes Weiner: "Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]