Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression
From: Dave Chinner
Date: Thu Aug 11 2016 - 20:54:57 EST
On Thu, Aug 11, 2016 at 09:55:33AM -0700, Linus Torvalds wrote:
> On Thu, Aug 11, 2016 at 8:57 AM, Christoph Hellwig <hch@xxxxxx> wrote:
> >
> > The one liner below (not tested yet) to simply remove it should fix that
> > up. I also noticed we have a spurious pagefault_disable/enable, I
> > need to dig into the history of that first, though.
>
> Hopefully the pagefault_disable/enable doesn't matter for this case.
>
> Can we get this one-liner tested with the kernel robot for comparison?
> I really think a messed-up LRU list could cause bad IO patterns, and
> end up keeping dirty pages around that should be streaming out to disk
> and re-used, so causing memory pressure etc for no good reason.
>
> I think the mapping->tree_lock issue that Dave sees is interesting
> too, but the kswapd activity (and the extra locking it causes) could
> also be a symptom of the same thing - memory pressure due to just
> putting pages in the active file that simply shouldn't be there.
So, removing mark_page_accessed() made the spinlock contention
*worse*.
36.51% [kernel] [k] _raw_spin_unlock_irqrestore
6.27% [kernel] [k] copy_user_generic_string
3.73% [kernel] [k] _raw_spin_unlock_irq
3.55% [kernel] [k] get_page_from_freelist
1.97% [kernel] [k] do_raw_spin_lock
1.72% [kernel] [k] __block_commit_write.isra.30
1.44% [kernel] [k] __wake_up_bit
1.41% [kernel] [k] shrink_page_list
1.24% [kernel] [k] __radix_tree_lookup
1.03% [kernel] [k] xfs_log_commit_cil
0.99% [kernel] [k] free_hot_cold_page
0.96% [kernel] [k] end_buffer_async_write
0.95% [kernel] [k] delay_tsc
0.94% [kernel] [k] ___might_sleep
0.93% [kernel] [k] kmem_cache_alloc
0.90% [kernel] [k] unlock_page
0.82% [kernel] [k] kmem_cache_free
0.74% [kernel] [k] up_write
0.72% [kernel] [k] node_dirty_ok
0.66% [kernel] [k] clear_page_dirty_for_io
0.65% [kernel] [k] __mark_inode_dirty
0.64% [kernel] [k] __block_write_begin_int
0.58% [kernel] [k] xfs_inode_item_format
0.57% [kernel] [k] __memset
0.57% [kernel] [k] cancel_dirty_page
0.56% [kernel] [k] down_write
0.54% [kernel] [k] page_evictable
0.53% [kernel] [k] page_mapping
0.52% [kernel] [k] __slab_free
0.49% [kernel] [k] xfs_do_writepage
0.49% [kernel] [k] drop_buffers
- 41.82% 41.82% [kernel] [k] _raw_spin_unlock_irqrestore
- 35.93% ret_from_fork
- kthread
- 29.76% kswapd
shrink_node
shrink_node_memcg.isra.75
shrink_inactive_list
shrink_page_list
__remove_mapping
_raw_spin_unlock_irqrestore
- 7.13% worker_thread
- process_one_work
- 4.40% wb_workfn
wb_writeback
__writeback_inodes_wb
writeback_sb_inodes
__writeback_single_inode
do_writepages
xfs_vm_writepages
write_cache_pages
xfs_do_writepage
- 2.71% xfs_end_io
xfs_destroy_ioend
end_buffer_async_write
end_page_writeback
test_clear_page_writeback
_raw_spin_unlock_irqrestore
+ 4.88% __libc_pwrite
The kswapd contention has jumped from 20% to 30% of the CPU time
in the profiles. I can't see how changing what LRU the page is on
will improve the contention problem - at it's sources it's a N:1
problem where the writing process and N kswapd worker threads are
all trying to access the same lock concurrently....
This is not the AIM7 problem we are looking for - what this test
demonstrates is a fundamental page cache scalability issue at the
design level - the mapping->tree_lock is a global serialisation
point....
I'm now going to test Christoph's theory that this is an "overwrite
doing lots of block mapping" issue. More on that to follow.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx