Re: [RFC 0/4] Fix machine check recovery for copy_from_user

From: Aili Yao
Date: Wed Apr 07 2021 - 22:13:44 EST


On Thu, 25 Mar 2021 17:02:31 -0700
Tony Luck <tony.luck@xxxxxxxxx> wrote:

> Maybe this is the way forward? I made some poor choices before
> to treat poison consumption in the kernel when accessing user data
> (get_user() or copy_from_user()) ... in particular assuming that
> the right action was sending a SIGBUS to the task as if it had
> synchronously accessed the poison location.
>
> First three patches may need to be combined (or broken up differently)
> for bisectablilty. But they are presented separately here since they
> touch separate parts of the problem.
>
> Second part is definitley incomplete. But I'd like to check that it
> is the right approach before expending more brain cells in the maze
> of nested macros that is lib/iov_iter.c
>
> Last part has been posted before. It covers the case where the kernel
> takes more than one swing at reading poison data before returning to
> user.
>
> Tony Luck (4):
> x86/mce: Fix copyin code to return -EFAULT on machine check.
> mce/iter: Check for copyin failure & return error up stack
> mce/copyin: fix to not SIGBUS when copying from user hits poison
> x86/mce: Avoid infinite loop for copy from user recovery
>
> arch/x86/kernel/cpu/mce/core.c | 63 +++++++++++++++++++++---------
> arch/x86/kernel/cpu/mce/severity.c | 2 -
> arch/x86/lib/copy_user_64.S | 18 +++++----
> fs/iomap/buffered-io.c | 8 +++-
> include/linux/sched.h | 2 +-
> include/linux/uio.h | 2 +-
> lib/iov_iter.c | 15 ++++++-
> 7 files changed, 77 insertions(+), 33 deletions(-)
>

I have one scenario, may you take into account:

If one copyin case occurs, write() returned by your patch, the user process may
check the return values, for errors, it may exit the process, then the error page
will be freed, and then the page maybe alloced to other process or to kernel itself,
then code will initialize it and this will trigger one SRAO, if it's used by kernel,
we may do nothing for this, and kernel may still touch it, and lead to one panic.

Is this we expect?

Thanks!
Aili Yao