Re: [PATCH 1/2] x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd

From: jarkko@xxxxxxxxxx
Date: Mon Sep 05 2022 - 05:45:13 EST


On Mon, Sep 05, 2022 at 07:50:33AM +0000, Huang, Kai wrote:
> On Sat, 2022-09-03 at 13:26 +0300, Jarkko Sakkinen wrote:
> > >   static int ksgxd(void *p)
> > >   {
> > > + unsigned long left_dirty;
> > > +
> > >    set_freezable();
> > >  
> > >    /*
> > >    * Sanitize pages in order to recover from kexec(). The 2nd pass is
> > >    * required for SECS pages, whose child pages blocked EREMOVE.
> > >    */
> > > - __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > - __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > + pr_debug("%ld unsanitized pages\n", left_dirty);
> >                   %lu
> >
>
> I assume the intention is to print out the unsanitized SECS pages, but what is
> the value of printing it? To me it doesn't provide any useful information, even
> for debug.

How do you measure "useful"?

If for some reason there were unsanitized pages, I would at least
want to know where it ended on the first value.

Plus it does zero harm unless you explicitly turn it on.

> Besides, the first call of __sgx_sanitize_pages() can return 0, due to either
> kthread_should_stop() being true, or all EPC pages are EREMOVED successfully.
> So in this case kernel will print out "0 unsanitized pages\n", which doesn't
> make a lot sense?
>
> > >  
> > > - /* sanity check: */
> > > - WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > + /*
> > > + * Never expected to happen in a working driver. If it happens the
> > > bug
> > > + * is expected to be in the sanitization process, but successfully
> > > + * sanitized pages are still valid and driver can be used and most
> > > + * importantly debugged without issues. To put short, the global
> > > state
> > > + * of kernel is not corrupted so no reason to do any more
> > > complicated
> > > + * rollback.
> > > + */
> > > + if (left_dirty)
> > > + pr_err("%ld unsanitized pages\n", left_dirty);
> >                         %lu
>
> No strong opinion, but IMHO we can still just WARN() when it is driver bug:
>
> 1) There's no guarantee the driver can continue to work if it has bug;
>
> 2) WARN() can panic() the kernel if /proc/sys/kernel/panic_on_warn is set is
> fine. It's expected behaviour. If I understand correctly, there are many
> places in the kernel that uses WARN() to catch bugs.
>
> In fact, we can even view WARN() as an advantage. For instance, if we only print
> out "xx unsanitized pages" in the existing code, people may even wouldn't have
> noticed this bug.
>
> From this perspective, if you want to print out, I think you may want to make
> the message more visible, that people can know it's driver bug. Perhaps
> something like "The driver has bug, please report to kernel community..", etc.
>
> 3) Changing WARN() to pr_err() conceptually isn't mandatory to fix this
> particular bug. So, it's kinda mixing things together.
>
> But again, no strong opinion here.
>
> --
> Thanks,
> -Kai
>
>

BR, Jarkko