Re: [PATCH 2/2] thp: support split page table lock

From: Alex Thorlton
Date: Fri Sep 06 2013 - 12:04:27 EST


On Thu, Sep 05, 2013 at 05:27:46PM -0400, Naoya Horiguchi wrote:
> Thp related code also uses per process mm->page_table_lock now.
> So making it fine-grained can provide better performance.
>
> This patch makes thp support split page table lock by using page->ptl
> of the pages storing "pmd_trans_huge" pmds.
>
> Some functions like pmd_trans_huge_lock() and page_check_address_pmd()
> are expected by their caller to pass back the pointer of ptl, so this
> patch adds to those functions new arguments for that. Rather than that,
> this patch gives only straightforward replacement.
>
> ChangeLog v3:
> - fixed argument of huge_pmd_lockptr() in copy_huge_pmd()
> - added missing declaration of ptl in do_huge_pmd_anonymous_page()

I've applied these and tested them using the same tests program that I
used when I was working on the same issue, and I'm running into some
bugs. Here's a stack trace:

general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 268 PID: 32381 Comm: memscale Not tainted
3.11.0-medusa-03121-g757f8ca #184
Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS
01/15/2013
task: ffff880fbdd82180 ti: ffff880fc0c5a000 task.ti: ffff880fc0c5a000
RIP: 0010:[<ffffffff810e3eef>] [<ffffffff810e3eef>]
pgtable_trans_huge_withdraw+0x38/0x60
RSP: 0018:ffff880fc0c5bc88 EFLAGS: 00010297
RAX: ffffea17cebe8838 RBX: 00000015309bd000 RCX: ffffea01f623b028
RDX: dead000000100100 RSI: ffff8dcf77d84c30 RDI: ffff880fbda67580
RBP: ffff880fc0c5bc88 R08: 0000000000000013 R09: 0000000000014da0
R10: ffff880fc0c5bc88 R11: ffff888f7efda000 R12: ffff8dcf77d84c30
R13: ffff880fc0c5bdf8 R14: 800005cf401ff067 R15: ffff8b4de5fabff8
FS: 0000000000000000(0000) GS:ffff880fffd80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff768b0b8 CR3: 0000000001a0b000 CR4: 00000000000407e0
Stack:
ffff880fc0c5bcc8 ffffffff810f7643 ffff880fc0c5bcc8 ffffffff810d8297
ffffea1456237510 00007fc7b0e00000 0000000000000000 00007fc7b0c00000
ffff880fc0c5bda8 ffffffff810d85ba ffff880fc0c5bd48 ffff880fc0c5bd68
Call Trace:
[<ffffffff810f7643>] zap_huge_pmd+0x4c/0x101
[<ffffffff810d8297>] ? tlb_flush_mmu+0x58/0x75
[<ffffffff810d85ba>] unmap_single_vma+0x306/0x7d6
[<ffffffff810d8ad9>] unmap_vmas+0x4f/0x82
[<ffffffff810dab5e>] exit_mmap+0x8b/0x113
[<ffffffff810a9743>] ? __delayacct_add_tsk+0x170/0x182
[<ffffffff8103c609>] mmput+0x3e/0xc4
[<ffffffff8104088c>] do_exit+0x380/0x907
[<ffffffff810fb89c>] ? vfs_write+0x149/0x1a3
[<ffffffff81040e85>] do_group_exit+0x72/0x9b
[<ffffffff81040ec0>] SyS_exit_group+0x12/0x16
[<ffffffff814f52d2>] system_call_fastpath+0x16/0x1b
Code: 51 20 48 8d 41 20 48 39 c2 75 0d 48 c7 87 28 03 00 00 00 00 00 00
eb 36 48 8d 42 e0 48 89 87 28 03 00 00 48 8b 51 20 48 8b 41 28 <48> 89
42 08 48 89 10 48 ba 00 01 10 00 00 00 ad de 48 b8 00 02
RIP [<ffffffff810e3eef>] pgtable_trans_huge_withdraw+0x38/0x60
RSP <ffff880fc0c5bc88>
---[ end trace e5413b388b6ea448 ]---
Fixing recursive fault but reboot is needed!
general protection fault: 0000 [#2] SMP
Modules linked in:
CPU: 268 PID: 1722 Comm: kworker/268:1 Tainted: G D
3.11.0-medusa-03121-g757f8ca #184
Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS
01/15/2013
Workqueue: events vmstat_update
task: ffff880fc1a74280 ti: ffff880fc1a76000 task.ti: ffff880fc1a76000
RIP: 0010:[<ffffffff810bcdcb>] [<ffffffff810bcdcb>]
free_pcppages_bulk+0x97/0x329
RSP: 0018:ffff880fc1a77c98 EFLAGS: 00010082
RAX: ffff880fffd94d68 RBX: dead0000002001e0 RCX: ffff880fffd94d50
RDX: ffff880fffd94d68 RSI: 000000000000001f RDI: ffff888f7efdac68
RBP: ffff880fc1a77cf8 R08: 0000000000000400 R09: ffffffff81a8bf00
R10: ffff884f7efdac00 R11: ffffffff81009bae R12: dead000000200200
R13: ffff888f7efdac00 R14: 000000000000001f R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880fffd80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff768b0b8 CR3: 0000000001a0b000 CR4: 00000000000407e0
Stack:
ffff880fc1a77ce8 ffff880fffd94d68 0000000000000010 ffff880fffd94d50
0000001ff9276a68 ffff880fffd94d60 0000000000000000 000000000000001f
ffff880fffd94d50 0000000000000292 ffff880fc1a77d38 ffff880fffd95d05
Call Trace:
[<ffffffff810bd149>] drain_zone_pages+0x33/0x42
[<ffffffff810cd5a6>] refresh_cpu_vm_stats+0xcc/0x11e
[<ffffffff810cd609>] vmstat_update+0x11/0x43
[<ffffffff8105350f>] process_one_work+0x260/0x389
[<ffffffff8105381a>] worker_thread+0x1e2/0x332
[<ffffffff81053638>] ? process_one_work+0x389/0x389
[<ffffffff810579df>] kthread+0xb3/0xbd
[<ffffffff81053638>] ? process_one_work+0x389/0x389
[<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
[<ffffffff814f522c>] ret_from_fork+0x7c/0xb0
[<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
Code: 48 89 55 c8 48 39 14 08 74 ce 41 83 fe 03 44 0f 44 75 c4 48 83 c2
08 48 89 45 b0 48 89 55 a8 48 8b 45 a8 4c 8b 20 49 8d 5c 24 e0 <48> 8b
53 20 48 8b 43 28 48 89 42 08 48 89 10 48 ba 00 01 10 00
RIP [<ffffffff810bcdcb>] free_pcppages_bulk+0x97/0x329
RSP <ffff880fc1a77c98>
---[ end trace e5413b388b6ea449 ]---
BUG: unable to handle kernel paging request at ffffffffffffffd8
IP: [<ffffffff8105742c>] kthread_data+0xb/0x11
PGD 1a0c067 PUD 1a0e067 PMD 0
Oops: 0000 [#3] SMP
Modules linked in:
CPU: 268 PID: 1722 Comm: kworker/268:1 Tainted: G D
3.11.0-medusa-03121-g757f8ca #184
Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS
01/15/2013
task: ffff880fc1a74280 ti: ffff880fc1a76000 task.ti: ffff880fc1a76000
RIP: 0010:[<ffffffff8105742c>] [<ffffffff8105742c>]
kthread_data+0xb/0x11
RSP: 0018:ffff880fc1a77948 EFLAGS: 00010092
RAX: 0000000000000000 RBX: 000000000000010c RCX: 0000000000000000
RDX: 000000000000000f RSI: 000000000000010c RDI: ffff880fc1a74280
RBP: ffff880fc1a77948 R08: 00000000000442c8 R09: 0000000000000000
R10: dead000000200200 R11: ffff880fc1a742e8 R12: ffff880fc1a74868
R13: ffff880fffd91cc0 R14: ffff880ff9b7a040 R15: 000000000000010c
FS: 0000000000000000(0000) GS:ffff880fffd80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 0000000001a0b000 CR4: 00000000000407e0
Stack:
ffff880fc1a77968 ffffffff8105151f ffff880fc1a77968 ffff880fc1a74280
ffff880fc1a77ab8 ffffffff814f2e98 ffff880fc1a76010 0000000000004000
ffff880fc1a74280 0000000000011cc0 ffff880fc1a77fd8 ffff880fc1a77fd8
Call Trace:
[<ffffffff8105151f>] wq_worker_sleeping+0x10/0x82
[<ffffffff814f2e98>] __schedule+0x1b7/0x8f7
[<ffffffff8135d4bd>] ? mix_pool_bytes+0x4a/0x56
[<ffffffff810a5d05>] ? call_rcu_sched+0x16/0x18
[<ffffffff8103f708>] ? release_task+0x3a7/0x3bf
[<ffffffff814f36b5>] schedule+0x61/0x63
[<ffffffff81040e0f>] do_exit+0x903/0x907
[<ffffffff8100529a>] oops_end+0xb9/0xc1
[<ffffffff81005393>] die+0x55/0x5e
[<ffffffff8100341a>] do_general_protection+0x93/0x139
[<ffffffff814f4d82>] general_protection+0x22/0x30
[<ffffffff81009bae>] ? default_idle+0x6/0x8
[<ffffffff810bcdcb>] ? free_pcppages_bulk+0x97/0x329
[<ffffffff810bcd5d>] ? free_pcppages_bulk+0x29/0x329
[<ffffffff810bd149>] drain_zone_pages+0x33/0x42
[<ffffffff810cd5a6>] refresh_cpu_vm_stats+0xcc/0x11e
[<ffffffff810cd609>] vmstat_update+0x11/0x43
[<ffffffff8105350f>] process_one_work+0x260/0x389
[<ffffffff8105381a>] worker_thread+0x1e2/0x332
[<ffffffff81053638>] ? process_one_work+0x389/0x389
[<ffffffff810579df>] kthread+0xb3/0xbd
[<ffffffff81053638>] ? process_one_work+0x389/0x389
[<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
[<ffffffff814f522c>] ret_from_fork+0x7c/0xb0
[<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
Code: 65 48 8b 04 25 40 b7 00 00 48 8b 80 90 05 00 00 48 89 e5 48 8b 40
c8 c9 48 c1 e8 02 83 e0 01 c3 48 8b 87 90 05 00 00 55 48 89 e5 <48> 8b
40 d8 c9 c3 48 3b 3d 67 ca c2 00 55 48 89 e5 75 09 0f bf
RIP [<ffffffff8105742c>] kthread_data+0xb/0x11
RSP <ffff880fc1a77948>
CR2: ffffffffffffffd8
---[ end trace e5413b388b6ea44a ]---
Fixing recursive fault but reboot is needed!

I'm testing on a 528 core machine, with ~2TB of memory, THP on. The
test case works like this:

- Spawn 512 threads using pthread_create, pin each thread to a separate
cpu
- Each thread allocates 512mb, local to its cpu
- Threads are sent a "go" signal, all threads begin touching the first
byte of each 4k chunk of their 512mb simultaneously

I'm working on debugging the issue now, but I thought I'd get this out
to everyone in case they might have some input. I'll try and get my
test program cleaned up and posted somewhere today so that others can
try it out as well.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/