Re: [intel-sgx-kernel-dev] [PATCH v7 4/8] intel_sgx: driver for Intel Software Guard Extensions
From: Jarkko Sakkinen
Date: Thu Dec 14 2017 - 08:03:27 EST
On Tue, Dec 12, 2017 at 01:32:28PM -0800, Sean Christopherson wrote:
> On Thu, 2017-12-07 at 18:05 +0200, Jarkko Sakkinen wrote:
> > On Thu, Dec 07, 2017 at 02:46:39PM +0000, Christopherson, Sean J wrote:
> > >
> > > >
> > > > + for (i = 0; i < 2; i++) {
> > > > + va_page = list_first_entry(&encl->va_pages,
> > > > + struct sgx_va_page, list);
> > > > + va_offset = sgx_alloc_va_slot(va_page);
> > > > + if (va_offset < PAGE_SIZE)
> > > > + break;
> > > > +
> > > > + list_move_tail(&va_page->list, &encl->va_pages);
> > > > + }
> > > This is broken, there is no guarantee that the next VA page will have
> > > a free slot. You have to walk over all VA pages to guarantee a slot
> > > is found, e.g. this caused EWB and ELDU errors.
> > I did run some extensive stress tests on this and did not experience any
> > issues. Full VA pages are always put to the end. Please point me to the
> > test where this breaks so that I can fix the issue if it persists.
> >
> > >
> > > Querying list.next to determine if an encl_page is resident in the EPC
> > > is ugly and unintuitive, and depending on list's internal state seems
> > > dangerous. Why not use a flag in the encl_page, e.g. as in the patch
> > > I submitted almost 8 months ago for combining epc_page and va_page into
> > > a union? And, the encl's SGX_ENCL_SECS_EVICTED flag can be dropped if
> > > a flag is added to indicate whether or not any encl_page is resident in
> > > the EPC.
> > >
> > > https://lists.01.org/pipermail/intel-sgx-kernel-dev/2017-April/000570.html
> > I think it is better to just zero list entry and do list_empty test. You
> > correct that checking that with poison is ugly.
>
> Except this whole approach breaks if you do list_del_init instead of
> list_del. Inferring the residency of a page based on whether or not
> it's on a list AND how the page was removed from said list is fragile.
> And, the lack of an explicit flag makes it quite painful to debug any
> issues, e.g. it's difficult to identify the call site of list_del.
>
> Case in point, I spent the better part of a day debugging a #PF BUG
> in sgx_eldu because it tried to directly deference an EPC page. The
> list check in sgx_fault_page failed to detect an already-faulted page
> because sgx_isolate_pages calls list_del and releases the enclave's
> mutex long before the page is actually evicted.
>
>
> [ 656.093093] BUG: unable to handle kernel paging request at 0000000480f23000
> [ 656.095157] IP: sgx_eldu+0xc1/0x3c0 [intel_sgx]
> [ 656.095760] PGD 469f6a067 P4D 469f6a067 PUD 0
> [ 656.096371] Oops: 0000 [#1] SMP
> [ 656.096818] Modules linked in: intel_sgx scsi_transport_iscsi bridge stp llc
> [ 656.097747] CPU: 3 PID: 5362 Comm: lsdt Not tainted 4.14.0+ #5
> [ 656.098514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [ 656.099472] task: ffffa0af5c1b9d80 task.stack: ffffacd9473e0000
> [ 656.100233] RIP: 0010:sgx_eldu+0xc1/0x3c0 [intel_sgx]
> [ 656.100843] RSP: 0000:ffffacd9473e3c40 EFLAGS: 00010286
> [ 656.101491] RAX: 0000000480f23000 RBX: ffffacd94a29d000 RCX: 0000000000000000
> [ 656.102369] RDX: 0000000000000000 RSI: ffffa0af54424b90 RDI: 0000000485224000
> [ 656.103225] RBP: ffffacd9473e3cf0 R08: ffffef4f5180c59c R09: ffffa0af54424b68
> [ 656.104102] R10: ffffacd9473e3ab8 R11: 0000000000000040 R12: ffffef4f513e7980
> [ 656.104970] R13: ffffa0af693fe5e0 R14: ffffef4f5180c580 R15: ffffa0af6c885a00
> [ 656.105851] FS: 00007f42ea7fc700(0000) GS:ffffa0af7fcc0000(0000) knlGS:0000000000000000
> [ 656.106767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 656.107470] CR2: 0000000480f23000 CR3: 0000000467fc6004 CR4: 00000000003606e0
> [ 656.108244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 656.109060] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 656.109880] Call Trace:
> [ 656.110224] ? __wake_up_common_lock+0x8e/0xc0
> [ 656.110740] sgx_fault_page+0x1d5/0x390 [intel_sgx]
> [ 656.111319] ? sgx_fault_page+0x1d5/0x390 [intel_sgx]
> [ 656.111917] sgx_vma_fault+0x17/0x40 [intel_sgx]
> [ 656.112517] __do_fault+0x1c/0x60
> [ 656.112916] __handle_mm_fault+0x98c/0xeb0
> [ 656.113385] ? set_next_entity+0x109/0x6e0
> [ 656.113876] handle_mm_fault+0xcc/0x1c0
> [ 656.114423] __do_page_fault+0x262/0x4f0
> [ 656.114956] do_page_fault+0x2e/0xe0
> [ 656.115488] do_async_page_fault+0x1a/0x80
> [ 656.116071] async_page_fault+0x22/0x30
> [ 656.118384] RIP: 0033:0x5db36e
> [ 656.120406] RSP: 002b:00007f42ea7fbbf0 EFLAGS: 00000202
> [ 656.121970] RAX: 0000000000000003 RBX: 00007f42e624e000 RCX: 00000000005db36e
> [ 656.123512] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 656.125023] RBP: 00007f42ea7fbc40 R08: 0000000000000000 R09: 0000000000000000
> [ 656.126369] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 656.127581] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 656.128812] Code: 02 00 00 48 c7 85 68 ff ff ff 00 00 00 00 31 db 80 7d 8c 00
> [ 656.132076] RIP: sgx_eldu+0xc1/0x3c0 [intel_sgx] RSP: ffffacd9473e3c40
> [ 656.133211] CR2: 0000000480f23000
> [ 656.133975] ---[ end trace e128b086ca834f1a ]---
>
> > Last flag bit wll be needed for the SGX_ENCL_PAGE_TRIM. It is useful to
> > have the flag in the enclave in order to be able to pack struct
> > sgx_encl_page.
> >
> > /Jarkko
You are correct. It is too fragile. I'll squeeze the flag in.
/Jarkko