Re: [failures] mm-vmscan-remove-unnecessary-lruvec-adding.patch removed from -mm tree
From: Qian Cai
Date: Thu Mar 05 2020 - 22:32:24 EST
> On Mar 5, 2020, at 9:50 PM, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
>
>
> The patch titled
> Subject: mm/vmscan: remove unnecessary lruvec adding
> has been removed from the -mm tree. Its filename was
> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>
> This patch was dropped because it had testing failures
Andrew, do you have more information about this failure? I hit a bug
here under memory pressure and am wondering if this is related
which might save me some time diggingâ
[ 4389.727184][ T6600] mem_cgroup_update_lru_size(00000000bb31aaed, 0, -7): lru_size -1
[ 4389.735272][ T6600] WARNING: CPU: 9 PID: 6600 at mm/memcontrol.c:1287 mem_cgroup_update_lru_size+0x17d/0x1b0
[ 4389.745210][ T6600] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure irqbypass dax_pmem dax_pmem_core efivars acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
[ 4389.771620][ T6600] CPU: 9 PID: 6600 Comm: oom01 Tainted: G L 5.6.0-rc4-next-20200305+ #4
[ 4389.781209][ T6600] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 4389.790577][ T6600] RIP: 0010:mem_cgroup_update_lru_size+0x17d/0x1b0
[ 4389.797108][ T6600] Code: d9 c7 e5 ff 49 89 d9 45 89 e0 44 89 f1 4c 89 ea 48 c7 c6 a0 86 81 83 48 c7 c7 9e 07 9e 83 c6 05 90 53 18 01 01 e8 25 a5 c8 ff <0f> 0b eb bc 48 89 de 48 c7 c7 80 e7 ce 83 e8 10 14 23 00 e9 e1 fe
[ 4389.816750][ T6600] RSP: 0018:ffffbf7b0adc3598 EFLAGS: 00010082
[ 4389.822793][ T6600] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[ 4389.830737][ T6600] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbf7b0adc341c
[ 4389.838685][ T6600] RBP: ffffbf7b0adc35d8 R08: 0000000000000000 R09: 0000bf7b0adc341c
[ 4389.846631][ T6600] R10: 0000bf7b0adc33a8 R11: 0000bf7b0adc341f R12: 00000000fffffff9
[ 4389.854556][ T6600] R13: ffff978a77534400 R14: 0000000000000000 R15: 0000000000000000
[ 4389.862525][ T6600] FS: 00007f64a8f3b700(0000) GS:ffff979272880000(0000) knlGS:0000000000000000
[ 4389.871498][ T6600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4389.878065][ T6600] CR2: 00007f632d210000 CR3: 000000067ee08000 CR4: 00000000003406e0
[ 4389.885986][ T6600] Call Trace:
[ 4389.889259][ T6600] isolate_lru_pages+0x6c5/0xfd0
[ 4389.894227][ T6600] ? __const_udelay+0x3c/0x40
[ 4389.898935][ T6600] shrink_inactive_list+0x18a/0x860
[ 4389.904182][ T6600] shrink_lruvec+0x5d9/0xb70
[ 4389.908736][ T6600] ? find_held_lock+0x35/0xa0
[ 4389.913382][ T6600] ? percpu_ref_put_many+0xdd/0x1c0
[ 4389.918579][ T6600] shrink_node+0x2d6/0xca0
[ 4389.923032][ T6600] do_try_to_free_pages+0x1f7/0x9a0
[ 4389.928226][ T6600] try_to_free_pages+0x252/0x5b0
[ 4389.933112][ T6600] __alloc_pages_slowpath+0x458/0x1290
[ 4389.938548][ T6600] __alloc_pages_nodemask+0x3bb/0x450
[ 4389.943889][ T6600] alloc_pages_vma+0x8a/0x2c0
[ 4389.948631][ T6600] do_anonymous_page+0x16e/0x6f0
[ 4389.953523][ T6600] ? __lock_acquire+0x443/0x37c0
[ 4389.958426][ T6600] __handle_mm_fault+0xce1/0xd50
[ 4389.963415][ T6600] handle_mm_fault+0xfc/0x2f0
[ 4389.968055][ T6600] do_page_fault+0x263/0x6f9
[ 4389.972629][ T6600] page_fault+0x34/0x40
[ 4389.976741][ T6600] RIP: 0033:0x411ab0
[ 4389.980600][ T6600] Code: 89 de e8 83 16 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 75 1c ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 4390.000293][ T6600] RSP: 002b:00007f64a8f3aec0 EFLAGS: 00010206
[ 4390.006320][ T6600] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f837e05cb77
[ 4390.014254][ T6600] RDX: 00000000052d6000 RSI: 00000000c0000000 RDI: 0000000000000000
[ 4390.022213][ T6600] RBP: 00007f6327f3a000 R08: 00000000ffffffff R09: 0000000000000000
[ 4390.030150][ T6600] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[ 4390.038104][ T6600] R13: 00007ffd7960ec0f R14: 0000000000000000 R15: 00007f64a8f3afc0
[ 4390.046046][ T6600] irq event stamp: 400622
[ 4390.050376][ T6600] hardirqs last enabled at (400621): [<ffffffff82b94df7>] free_unref_page_list+0x1c7/0x2b0
[ 4390.060430][ T6600] hardirqs last disabled at (400622): [<ffffffff832d8fbc>] _raw_spin_lock_irq+0x1c/0x60
[ 4390.070144][ T6600] softirqs last enabled at (400510): [<ffffffff8360034c>] __do_softirq+0x34c/0x57c
[ 4390.079487][ T6600] softirqs last disabled at (400501): [<ffffffff828c68d2>] irq_exit+0xa2/0xc0
[ 4390.088394][ T6600] ---[ end trace eb6136217ea3d652 ]---
[ 4390.093976][ T6600] ------------[ cut here ]------------
[ 4390.099379][ T6600] kernel BUG at mm/memcontrol.c:1288!
[ 4390.104712][ T6600] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 4390.111523][ T6600] CPU: 9 PID: 6600 Comm: oom01 Tainted: G W L 5.6.0-rc4-next-20200305+ #4
[ 4390.121105][ T6600] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 4390.130485][ T6600] RIP: 0010:mem_cgroup_update_lru_size+0x13d/0x1b0
[ 4390.136987][ T6600] Code: 00 48 85 db 79 b7 48 c7 c7 78 32 db 83 e8 7b cd e5 ff 44 0f b6 3d db 53 18 01 41 80 ff 01 0f 87 e3 69 00 00 41 83 e7 01 74 0e <0f> 0b 48 c7 c7 70 e7 ce 83 e8 47 17 23 00 48 c7 c7 78 32 db 83 e8
[ 4390.156680][ T6600] RSP: 0018:ffffbf7b0adc3598 EFLAGS: 00010082
[ 4390.162716][ T6600] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[ 4390.170664][ T6600] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbf7b0adc341c
[ 4390.178598][ T6600] RBP: ffffbf7b0adc35d8 R08: 0000000000000000 R09: 0000bf7b0adc341c
[ 4390.186551][ T6600] R10: 0000bf7b0adc33a8 R11: 0000bf7b0adc341f R12: 00000000fffffff9
[ 4390.194468][ T6600] R13: ffff978a77534400 R14: 0000000000000000 R15: 0000000000000000
[ 4390.202478][ T6600] FS: 00007f64a8f3b700(0000) GS:ffff979272880000(0000) knlGS:0000000000000000
[ 4390.211380][ T6600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.217923][ T6600] CR2: 00007f632d210000 CR3: 000000067ee08000 CR4: 00000000003406e0
[ 4390.225852][ T6600] Call Trace:
[ 4390.229064][ T6600] isolate_lru_pages+0x6c5/0xfd0
[ 4390.233926][ T6600] ? __const_udelay+0x3c/0x40
[ 4390.238594][ T6600] shrink_inactive_list+0x18a/0x860
[ 4390.243779][ T6600] shrink_lruvec+0x5d9/0xb70
[ 4390.248312][ T6600] ? find_held_lock+0x35/0xa0
[ 4390.252945][ T6600] ? percpu_ref_put_many+0xdd/0x1c0
[ 4390.258106][ T6600] shrink_node+0x2d6/0xca0
[ 4390.262472][ T6600] do_try_to_free_pages+0x1f7/0x9a0
[ 4390.267627][ T6600] try_to_free_pages+0x252/0x5b0
[ 4390.272527][ T6600] __alloc_pages_slowpath+0x458/0x1290
[ 4390.277953][ T6600] __alloc_pages_nodemask+0x3bb/0x450
[ 4390.283264][ T6600] alloc_pages_vma+0x8a/0x2c0
[ 4390.287889][ T6600] do_anonymous_page+0x16e/0x6f0
[ 4390.292760][ T6600] ? __lock_acquire+0x443/0x37c0
[ 4390.297650][ T6600] __handle_mm_fault+0xce1/0xd50
[ 4390.302551][ T6600] handle_mm_fault+0xfc/0x2f0
[ 4390.307177][ T6600] do_page_fault+0x263/0x6f9
[ 4390.311780][ T6600] page_fault+0x34/0x40
[ 4390.315899][ T6600] RIP: 0033:0x411ab0
[ 4390.319854][ T6600] Code: 89 de e8 83 16 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 75 1c ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 4390.339502][ T6600] RSP: 002b:00007f64a8f3aec0 EFLAGS: 00010206
[ 4390.345521][ T6600] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f837e05cb77
[ 4390.353463][ T6600] RDX: 00000000052d6000 RSI: 00000000c0000000 RDI: 0000000000000000
[ 4390.361389][ T6600] RBP: 00007f6327f3a000 R08: 00000000ffffffff R09: 0000000000000000
[ 4390.369318][ T6600] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[ 4390.377256][ T6600] R13: 00007ffd7960ec0f R14: 0000000000000000 R15: 00007f64a8f3afc0
[ 4390.385241][ T6600] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure irqbypass dax_pmem dax_pmem_core efivars acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
[ 4390.412408][ T6600] ---[ end trace eb6136217ea3d653 ]---
[ 4390.417817][ T6600] RIP: 0010:mem_cgroup_update_lru_size+0x13d/0x1b0
[ 4390.424306][ T6600] Code: 00 48 85 db 79 b7 48 c7 c7 78 32 db 83 e8 7b cd e5 ff 44 0f b6 3d db 53 18 01 41 80 ff 01 0f 87 e3 69 00 00 41 83 e7 01 74 0e <0f> 0b 48 c7 c7 70 e7 ce 83 e8 47 17 23 00 48 c7 c7 78 32 db 83 e8
[ 4390.443957][ T6600] RSP: 0018:ffffbf7b0adc3598 EFLAGS: 00010082
[ 4390.449975][ T6600] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[ 4390.457930][ T6600] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbf7b0adc341c
[ 4390.465853][ T6600] RBP: ffffbf7b0adc35d8 R08: 0000000000000000 R09: 0000bf7b0adc341c
[ 4390.473808][ T6600] R10: 0000bf7b0adc33a8 R11: 0000bf7b0adc341f R12: 00000000fffffff9
[ 4390.481743][ T6600] R13: ffff978a77534400 R14: 0000000000000000 R15: 0000000000000000
[ 4390.489718][ T6600] FS: 00007f64a8f3b700(0000) GS:ffff979272880000(0000) knlGS:0000000000000000
[ 4390.498624][ T6600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.505162][ T6600] CR2: 00007f632d210000 CR3: 000000067ee08000 CR4: 00000000003406e0
[ 4390.513086][ T6600] Kernel panic - not syncing: Fatal exception
[ 4391.870599][ T6600] Shutting down cpus with NMI
[ 4391.875212][ T6600] Kernel Offset: 0x1800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 4391.886841][ T6600] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> ------------------------------------------------------
> From: Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx>
> Subject: mm/vmscan: remove unnecessary lruvec adding
>
> Patch series "per lruvec lru_lock for memcg", v9.
>
> A partial merge. The first 6 patches from a 20 patch series. Some code
> cleanups and minimal optimizations.
>
>
> This patch (of 6):
>
> We don't have to add a freeable page into lru and then remove from it.
> This change saves a couple of actions and makes the moving more clear.
>
> The SetPageLRU needs to be kept here for list intergrity.
> Otherwise:
> #0 mave_pages_to_lru #1 release_pages
> if (put_page_testzero())
> if !put_page_testzero
> !PageLRU //skip lru_lock
> list_add(&page->lru,)
> list_add(&page->lru,) //corrupt
>
> [akpm@xxxxxxxxxxxxxxxxxxxx: coding style fixes]
> Link: http://lkml.kernel.org/r/1583146830-169516-2-git-send-email-alex.shi@xxxxxxxxxxxxxxxxx
> Signed-off-by: Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
> Cc: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxxxx>
> Cc: Mike Kravetz <kravetz@xxxxxxxxxx>
> Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
> mm/vmscan.c | 32 +++++++++++++++++++++-----------
> 1 file changed, 21 insertions(+), 11 deletions(-)
>
> --- a/mm/vmscan.c~mm-vmscan-remove-unnecessary-lruvec-adding
> +++ a/mm/vmscan.c
> @@ -1838,26 +1838,29 @@ static unsigned noinline_for_stack move_
> while (!list_empty(list)) {
> page = lru_to_page(list);
> VM_BUG_ON_PAGE(PageLRU(page), page);
> + list_del(&page->lru);
> if (unlikely(!page_evictable(page))) {
> - list_del(&page->lru);
> spin_unlock_irq(&pgdat->lru_lock);
> putback_lru_page(page);
> spin_lock_irq(&pgdat->lru_lock);
> continue;
> }
> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
>
> + /*
> + * The SetPageLRU needs to be kept here for list intergrity.
> + * Otherwise:
> + * #0 mave_pages_to_lru #1 release_pages
> + * if (put_page_testzero())
> + * if !put_page_testzero
> + * !PageLRU //skip lru_lock
> + * list_add(&page->lru,)
> + * list_add(&page->lru,) //corrupt
> + */
> SetPageLRU(page);
> - lru = page_lru(page);
> -
> - nr_pages = hpage_nr_pages(page);
> - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
> - list_move(&page->lru, &lruvec->lists[lru]);
>
> - if (put_page_testzero(page)) {
> + if (unlikely(put_page_testzero(page))) {
> __ClearPageLRU(page);
> __ClearPageActive(page);
> - del_page_from_lru_list(page, lruvec, lru);
>
> if (unlikely(PageCompound(page))) {
> spin_unlock_irq(&pgdat->lru_lock);
> @@ -1865,9 +1868,16 @@ static unsigned noinline_for_stack move_
> spin_lock_irq(&pgdat->lru_lock);
> } else
> list_add(&page->lru, &pages_to_free);
> - } else {
> - nr_moved += nr_pages;
> + continue;
> }
> +
> + lruvec = mem_cgroup_page_lruvec(page, pgdat);
> + lru = page_lru(page);
> + nr_pages = hpage_nr_pages(page);
> +
> + update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
> + list_add(&page->lru, &lruvec->lists[lru]);
> + nr_moved += nr_pages;
> }
>
> /*
> _
>
> Patches currently in -mm which might be from alex.shi@xxxxxxxxxxxxxxxxx are
>
> ocfs2-remove-fs_ocfs2_nm.patch
> ocfs2-remove-unused-macros.patch
> ocfs2-use-ocfs2_sec_bits-in-macro.patch
> ocfs2-remove-dlm_lock_is_remote.patch
> ocfs2-remove-useless-err.patch
> mm-memcg-fold-lock_page_lru-into-commit_charge.patch
> mm-page_idle-no-unlikely-double-check-for-idle-page-counting.patch
> mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
> mm-thp-clean-up-lru_add_page_tail.patch
> mm-thp-narrow-lru-locking.patch
>