Re: [RFC PATCH v2 1/2] uaccess: Check no rescheduling function is called in unsafe region

From: Valentin Schneider
Date: Mon Jan 14 2019 - 07:03:32 EST


Hi,

On 03/12/2018 13:55, Julien Thierry wrote:
> While running a user_access regions, it is not supported to reschedule.
> Add an overridable primitive to indicate whether a user_access region is
> active and check that this is not the case when calling rescheduling
> functions.
>
> Also, add a comment clarifying the behaviour of user_access regions.
>
> Signed-off-by: Julien Thierry <julien.thierry@xxxxxxx>
> ---
> include/linux/kernel.h | 6 ++++--
> include/linux/uaccess.h | 11 +++++++++++
> kernel/sched/core.c | 19 +++++++++++++++++++
> 3 files changed, 34 insertions(+), 2 deletions(-)
>
> I'm not sure these are the best locations to check this but I was hoping
> this patch could start the discussion.
>
> Should I move the check? Should I add a config option to conditionally
> build those checks?
>

I was going to say it's already under DEBUG_ATOMIC_SLEEP, but that's only
true for the __might_sleep() bit actually.

I think it'd make sense to blanket that under a config, but using
DEBUG_ATOMIC_SLEEP for that is a bit too much. What about a
DEBUG_UACCESS_SLEEP?

> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index d6aac75..fe0e984 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -237,11 +237,13 @@
> struct pt_regs;
> struct user;
>
> +extern void __might_resched(const char *file, int line);
> #ifdef CONFIG_PREEMPT_VOLUNTARY
> extern int _cond_resched(void);
> -# define might_resched() _cond_resched()
> +# define might_resched() \
> + do { __might_resched(__FILE__, __LINE__); _cond_resched(); } while (0)
> #else
> -# define might_resched() do { } while (0)
> +# define might_resched() __might_resched(__FILE__, __LINE__)> #endif
>
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index efe79c1..50adb84 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -266,6 +266,13 @@ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
> #define probe_kernel_address(addr, retval) \
> probe_kernel_read(&retval, addr, sizeof(retval))
>
> +/*
> + * user_access_begin() and user_access_end() define a region where
> + * unsafe user accessors can be used.
> + * During execution of this region, no sleeping functions should be called.
> + * Exceptions and interrupt shall exit the user_access region and re-enter it
> + * when returning to the interrupted context.
> + */

I would first have the bit about exceptions, then mention sleeping and add
something along the lines of

"[...] no sleeping functions should be called - we rely on exception
handling to take care of the user_access status for us, but that doesn't
happen when directly calling schedule()."

My wording's not the best but I just want something to point out *why*
sleeping ain't okay.

> #ifndef user_access_begin
> #define user_access_begin() do { } while (0)
> #define user_access_end() do { } while (0)
> @@ -273,6 +280,10 @@ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
> #define unsafe_put_user(x, ptr, err) do { if (unlikely(__put_user(x, ptr))) goto err; } while (0)
> #endif
>
> +#ifndef unsafe_user_region_active
> +#define unsafe_user_region_active() false
> +#endif
> +
> #ifdef CONFIG_HARDENED_USERCOPY
> void usercopy_warn(const char *name, const char *detail, bool to_user,
> unsigned long offset, unsigned long len);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 6fedf3a..03f53c8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3289,6 +3289,13 @@ static inline void schedule_debug(struct task_struct *prev)
> __schedule_bug(prev);
> preempt_count_set(PREEMPT_DISABLED);
> }
> +
> + if (unlikely(unsafe_user_region_active())) {
> + printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
> + prev->comm, prev->pid, preempt_count());
> + dump_stack();
> + }
> +
> rcu_sleep_check();
>
> profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> @@ -6151,6 +6158,18 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
> EXPORT_SYMBOL(___might_sleep);
> #endif
>
> +void __might_resched(const char *file, int line)
> +{
> + if (!unsafe_user_region_active())
> + return;
> +
> + printk(KERN_ERR
> + "BUG: rescheduling function called from user access context at %s:%d\n",
> + file, line);
> + dump_stack();
> +}

So this check is "careful, things might go bad" and the schedule_debug()
one is "things went bad". IIUC we'll always get this warning when we hit
the schedule_debug() one. I was going to suggest only keeping one of them,
but I think both hold value.

> +EXPORT_SYMBOL(__might_resched);
> +
> #ifdef CONFIG_MAGIC_SYSRQ
> void normalize_rt_tasks(void)
> {
> --
> 1.9.1
>