Re: [intel-sgx-kernel-dev] [PATCH v7 4/8] intel_sgx: driver for Intel Software Guard Extensions
From: Sean Christopherson
Date: Tue Dec 12 2017 - 16:32:34 EST
On Thu, 2017-12-07 at 18:05 +0200, Jarkko Sakkinen wrote:
> On Thu, Dec 07, 2017 at 02:46:39PM +0000, Christopherson, Sean J wrote:
> >
> > >
> > > + for (i = 0; i < 2; i++) {
> > > + va_page = list_first_entry(&encl->va_pages,
> > > + ÂÂÂstruct sgx_va_page, list);
> > > + va_offset = sgx_alloc_va_slot(va_page);
> > > + if (va_offset < PAGE_SIZE)
> > > + break;
> > > +
> > > + list_move_tail(&va_page->list, &encl->va_pages);
> > > + }
> > This is broken, there is no guarantee that the next VA page will have
> > a free slot.ÂÂYou have to walk over all VA pages to guarantee a slot
> > is found, e.g. this caused EWB and ELDU errors.
> I did run some extensive stress tests on this and did not experience any
> issues. Full VA pages are always put to the end. Please point me to the
> test where this breaks so that I can fix the issue if it persists.
>
> >
> > Querying list.next to determine if an encl_page is resident in the EPC
> > is ugly and unintuitive, and depending on list's internal state seems
> > dangerous.ÂÂWhy not use a flag in the encl_page, e.g. as in the patch
> > I submitted almost 8 months ago for combining epc_page and va_page into
> > a union?ÂÂAnd, the encl's SGX_ENCL_SECS_EVICTED flag can be dropped if
> > a flag is added to indicate whether or not any encl_page is resident in
> > the EPC.
> >
> > https://lists.01.org/pipermail/intel-sgx-kernel-dev/2017-April/000570.html
> I think it is better to just zero list entry and do list_empty test. You
> correct that checking that with poison is ugly.
Except this whole approach breaks if you do list_del_init instead of
list_del. ÂInferring the residency of a page based on whether or not
it's on a list AND how the page was removed from said list is fragile.
And, the lack of an explicit flag makes it quite painful to debug any
issues, e.g. it's difficult to identify the call site of list_del.
Case in point, I spent the better part of a day debugging a #PF BUG
in sgx_eldu because it tried to directly deference an EPC page. ÂThe
list check in sgx_fault_page failed to detect an already-faulted page
because sgx_isolate_pages calls list_del and releases the enclave's
mutex long before the page is actually evicted.
[ÂÂ656.093093] BUG: unable to handle kernel paging request at 0000000480f23000
[ÂÂ656.095157] IP: sgx_eldu+0xc1/0x3c0 [intel_sgx]
[ÂÂ656.095760] PGD 469f6a067 P4D 469f6a067 PUD 0Â
[ÂÂ656.096371] Oops: 0000 [#1] SMP
[ÂÂ656.096818] Modules linked in: intel_sgx scsi_transport_iscsi bridge stp llc
[ÂÂ656.097747] CPU: 3 PID: 5362 Comm: lsdt Not tainted 4.14.0+ #5
[ÂÂ656.098514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ÂÂ656.099472] task: ffffa0af5c1b9d80 task.stack: ffffacd9473e0000
[ÂÂ656.100233] RIP: 0010:sgx_eldu+0xc1/0x3c0 [intel_sgx]
[ÂÂ656.100843] RSP: 0000:ffffacd9473e3c40 EFLAGS: 00010286
[ÂÂ656.101491] RAX: 0000000480f23000 RBX: ffffacd94a29d000 RCX: 0000000000000000
[ÂÂ656.102369] RDX: 0000000000000000 RSI: ffffa0af54424b90 RDI: 0000000485224000
[ÂÂ656.103225] RBP: ffffacd9473e3cf0 R08: ffffef4f5180c59c R09: ffffa0af54424b68
[ÂÂ656.104102] R10: ffffacd9473e3ab8 R11: 0000000000000040 R12: ffffef4f513e7980
[ÂÂ656.104970] R13: ffffa0af693fe5e0 R14: ffffef4f5180c580 R15: ffffa0af6c885a00
[ÂÂ656.105851] FS:ÂÂ00007f42ea7fc700(0000) GS:ffffa0af7fcc0000(0000) knlGS:0000000000000000
[ÂÂ656.106767] CS:ÂÂ0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ÂÂ656.107470] CR2: 0000000480f23000 CR3: 0000000467fc6004 CR4: 00000000003606e0
[ÂÂ656.108244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ÂÂ656.109060] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ÂÂ656.109880] Call Trace:
[ÂÂ656.110224]ÂÂ? __wake_up_common_lock+0x8e/0xc0
[ÂÂ656.110740]ÂÂsgx_fault_page+0x1d5/0x390 [intel_sgx]
[ÂÂ656.111319]ÂÂ? sgx_fault_page+0x1d5/0x390 [intel_sgx]
[ÂÂ656.111917]ÂÂsgx_vma_fault+0x17/0x40 [intel_sgx]
[ÂÂ656.112517]ÂÂ__do_fault+0x1c/0x60
[ÂÂ656.112916]ÂÂ__handle_mm_fault+0x98c/0xeb0
[ÂÂ656.113385]ÂÂ? set_next_entity+0x109/0x6e0
[ÂÂ656.113876]ÂÂhandle_mm_fault+0xcc/0x1c0
[ÂÂ656.114423]ÂÂ__do_page_fault+0x262/0x4f0
[ÂÂ656.114956]ÂÂdo_page_fault+0x2e/0xe0
[ÂÂ656.115488]ÂÂdo_async_page_fault+0x1a/0x80
[ÂÂ656.116071]ÂÂasync_page_fault+0x22/0x30
[ÂÂ656.118384] RIP: 0033:0x5db36e
[ÂÂ656.120406] RSP: 002b:00007f42ea7fbbf0 EFLAGS: 00000202
[ÂÂ656.121970] RAX: 0000000000000003 RBX: 00007f42e624e000 RCX: 00000000005db36e
[ÂÂ656.123512] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ÂÂ656.125023] RBP: 00007f42ea7fbc40 R08: 0000000000000000 R09: 0000000000000000
[ÂÂ656.126369] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ÂÂ656.127581] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ÂÂ656.128812] Code: 02 00 00 48 c7 85 68 ff ff ff 00 00 00 00 31 db 80 7d 8c 00
[ÂÂ656.132076] RIP: sgx_eldu+0xc1/0x3c0 [intel_sgx] RSP: ffffacd9473e3c40
[ÂÂ656.133211] CR2: 0000000480f23000
[ÂÂ656.133975] ---[ end trace e128b086ca834f1a ]---
> Last flag bit wll be needed for the SGX_ENCL_PAGE_TRIM. It is useful to
> have the flag in the enclave in order to be able to pack struct
> sgx_encl_page.
>
> /Jarkko