Re: [PATCH v17 18/23] platform/x86: Intel SGX driver

From: Jarkko Sakkinen
Date: Mon Dec 17 2018 - 08:39:43 EST


On Mon, Dec 17, 2018 at 03:28:59PM +0200, Jarkko Sakkinen wrote:
> On Fri, Dec 14, 2018 at 04:06:27PM -0800, Sean Christopherson wrote:
> > [ 504.149548] ------------[ cut here ]------------
> > [ 504.149550] kernel BUG at /home/sean/go/src/kernel.org/linux/mm/mmap.c:669!
> > [ 504.150288] invalid opcode: 0000 [#1] SMP
> > [ 504.150614] CPU: 2 PID: 237 Comm: kworker/u20:2 Not tainted 4.20.0-rc2+ #267
> > [ 504.151165] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > [ 504.151818] Workqueue: sgx-encl-wq sgx_encl_release_worker
> > [ 504.152267] RIP: 0010:__vma_adjust+0x64a/0x820
> > [ 504.152626] Code: ff 48 89 50 18 e9 6f fc ff ff 4c 8b ab 88 00 00 00 45 31 e4 e9 61 fb ff ff 31 c0 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 49 89 de 49 83 c6 20 0f 84 06 fe ff ff 49 8d 7e e0 e8 fe ee
> > [ 504.154109] RSP: 0000:ffffc900004ebd60 EFLAGS: 00010206
> > [ 504.154535] RAX: 00007fd92ef7e000 RBX: ffff888467af16c0 RCX: ffff888467af16e0
> > [ 504.155104] RDX: ffff888458fd09e0 RSI: 00007fd954021000 RDI: ffff88846bf9e798
> > [ 504.155673] RBP: ffff888467af1480 R08: ffff88845bea2000 R09: 0000000000000000
> > [ 504.156242] R10: 0000000080000000 R11: fefefefefefefeff R12: 0000000000000000
> > [ 504.156810] R13: ffff88846bf9e790 R14: ffff888467af1b70 R15: ffff888467af1b60
> > [ 504.157378] FS: 0000000000000000(0000) GS:ffff88846f700000(0000) knlGS:0000000000000000
> > [ 504.158021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 504.158483] CR2: 00007f2c56e99000 CR3: 0000000005009001 CR4: 0000000000360ee0
> > [ 504.159054] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 504.159623] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 504.160193] Call Trace:
> > [ 504.160406] __split_vma+0x16f/0x180
> > [ 504.160706] ? __switch_to_asm+0x40/0x70
> > [ 504.161024] __do_munmap+0xfb/0x450
> > [ 504.161308] sgx_encl_release_worker+0x44/0x70
> > [ 504.161675] process_one_work+0x200/0x3f0
> > [ 504.162004] worker_thread+0x2d/0x3d0
> > [ 504.162301] ? process_one_work+0x3f0/0x3f0
> > [ 504.162645] kthread+0x113/0x130
> > [ 504.162912] ? kthread_park+0x90/0x90
> > [ 504.163209] ret_from_fork+0x35/0x40
> > [ 504.163503] Modules linked in: bridge stp llc
> > [ 504.163866] ---[ end trace 83076139fc25e3e0 ]---
>
> There was a race with release and swapping code that I thought I fixed,
> and this is looks like a race there. Have to recheck what I did not
> consider. Anyway, though to share this if you have time to look at it.
> That is the part where something is now unsync most probably.

I think I found it. I was careless to make sgx_encl_release() to use
sgx_invalidate(), which does not delete pages in the case when enclave
is already marked as dead. This was after I had fixed the race that I
had there in the first place. That is why I was puzzled why it suddenly
reappeared.

Would be nice to use sgx_invalidate() also in release for consistency in
semantics sake so maybe just delete this:

if (encl->flags & SGX_ENCL_DEAD)
return;

?

/Jarkko