Re: [intel-sgx-kernel-dev] [PATCH v7 4/8] intel_sgx: driver for Intel Software Guard Extensions

From: Jarkko Sakkinen
Date: Thu Dec 14 2017 - 08:03:27 EST


On Tue, Dec 12, 2017 at 01:32:28PM -0800, Sean Christopherson wrote:
> On Thu, 2017-12-07 at 18:05 +0200, Jarkko Sakkinen wrote:
> > On Thu, Dec 07, 2017 at 02:46:39PM +0000, Christopherson, Sean J wrote:
> > >
> > > >
> > > > + for (i = 0; i < 2; i++) {
> > > > + va_page = list_first_entry(&encl->va_pages,
> > > > +    struct sgx_va_page, list);
> > > > + va_offset = sgx_alloc_va_slot(va_page);
> > > > + if (va_offset < PAGE_SIZE)
> > > > + break;
> > > > +
> > > > + list_move_tail(&va_page->list, &encl->va_pages);
> > > > + }
> > > This is broken, there is no guarantee that the next VA page will have
> > > a free slot.  You have to walk over all VA pages to guarantee a slot
> > > is found, e.g. this caused EWB and ELDU errors.
> > I did run some extensive stress tests on this and did not experience any
> > issues. Full VA pages are always put to the end. Please point me to the
> > test where this breaks so that I can fix the issue if it persists.
> >
> > >
> > > Querying list.next to determine if an encl_page is resident in the EPC
> > > is ugly and unintuitive, and depending on list's internal state seems
> > > dangerous.  Why not use a flag in the encl_page, e.g. as in the patch
> > > I submitted almost 8 months ago for combining epc_page and va_page into
> > > a union?  And, the encl's SGX_ENCL_SECS_EVICTED flag can be dropped if
> > > a flag is added to indicate whether or not any encl_page is resident in
> > > the EPC.
> > >
> > > https://lists.01.org/pipermail/intel-sgx-kernel-dev/2017-April/000570.html
> > I think it is better to just zero list entry and do list_empty test. You
> > correct that checking that with poison is ugly.
>
> Except this whole approach breaks if you do list_del_init instead of
> list_del.  Inferring the residency of a page based on whether or not
> it's on a list AND how the page was removed from said list is fragile.
> And, the lack of an explicit flag makes it quite painful to debug any
> issues, e.g. it's difficult to identify the call site of list_del.
>
> Case in point, I spent the better part of a day debugging a #PF BUG
> in sgx_eldu because it tried to directly deference an EPC page.  The
> list check in sgx_fault_page failed to detect an already-faulted page
> because sgx_isolate_pages calls list_del and releases the enclave's
> mutex long before the page is actually evicted.
>
>
> [  656.093093] BUG: unable to handle kernel paging request at 0000000480f23000
> [  656.095157] IP: sgx_eldu+0xc1/0x3c0 [intel_sgx]
> [  656.095760] PGD 469f6a067 P4D 469f6a067 PUD 0 
> [  656.096371] Oops: 0000 [#1] SMP
> [  656.096818] Modules linked in: intel_sgx scsi_transport_iscsi bridge stp llc
> [  656.097747] CPU: 3 PID: 5362 Comm: lsdt Not tainted 4.14.0+ #5
> [  656.098514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [  656.099472] task: ffffa0af5c1b9d80 task.stack: ffffacd9473e0000
> [  656.100233] RIP: 0010:sgx_eldu+0xc1/0x3c0 [intel_sgx]
> [  656.100843] RSP: 0000:ffffacd9473e3c40 EFLAGS: 00010286
> [  656.101491] RAX: 0000000480f23000 RBX: ffffacd94a29d000 RCX: 0000000000000000
> [  656.102369] RDX: 0000000000000000 RSI: ffffa0af54424b90 RDI: 0000000485224000
> [  656.103225] RBP: ffffacd9473e3cf0 R08: ffffef4f5180c59c R09: ffffa0af54424b68
> [  656.104102] R10: ffffacd9473e3ab8 R11: 0000000000000040 R12: ffffef4f513e7980
> [  656.104970] R13: ffffa0af693fe5e0 R14: ffffef4f5180c580 R15: ffffa0af6c885a00
> [  656.105851] FS:  00007f42ea7fc700(0000) GS:ffffa0af7fcc0000(0000) knlGS:0000000000000000
> [  656.106767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  656.107470] CR2: 0000000480f23000 CR3: 0000000467fc6004 CR4: 00000000003606e0
> [  656.108244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  656.109060] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  656.109880] Call Trace:
> [  656.110224]  ? __wake_up_common_lock+0x8e/0xc0
> [  656.110740]  sgx_fault_page+0x1d5/0x390 [intel_sgx]
> [  656.111319]  ? sgx_fault_page+0x1d5/0x390 [intel_sgx]
> [  656.111917]  sgx_vma_fault+0x17/0x40 [intel_sgx]
> [  656.112517]  __do_fault+0x1c/0x60
> [  656.112916]  __handle_mm_fault+0x98c/0xeb0
> [  656.113385]  ? set_next_entity+0x109/0x6e0
> [  656.113876]  handle_mm_fault+0xcc/0x1c0
> [  656.114423]  __do_page_fault+0x262/0x4f0
> [  656.114956]  do_page_fault+0x2e/0xe0
> [  656.115488]  do_async_page_fault+0x1a/0x80
> [  656.116071]  async_page_fault+0x22/0x30
> [  656.118384] RIP: 0033:0x5db36e
> [  656.120406] RSP: 002b:00007f42ea7fbbf0 EFLAGS: 00000202
> [  656.121970] RAX: 0000000000000003 RBX: 00007f42e624e000 RCX: 00000000005db36e
> [  656.123512] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [  656.125023] RBP: 00007f42ea7fbc40 R08: 0000000000000000 R09: 0000000000000000
> [  656.126369] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  656.127581] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  656.128812] Code: 02 00 00 48 c7 85 68 ff ff ff 00 00 00 00 31 db 80 7d 8c 00
> [  656.132076] RIP: sgx_eldu+0xc1/0x3c0 [intel_sgx] RSP: ffffacd9473e3c40
> [  656.133211] CR2: 0000000480f23000
> [  656.133975] ---[ end trace e128b086ca834f1a ]---
>
> > Last flag bit wll be needed for the SGX_ENCL_PAGE_TRIM. It is useful to
> > have the flag in the enclave in order to be able to pack struct
> > sgx_encl_page.
> >
> > /Jarkko

You are correct. It is too fragile. I'll squeeze the flag in.

/Jarkko