Re: [PATCH 0.5/3] livepatch: Convert stack entries array to percpu

From: Petr Mladek
Date: Tue Mar 14 2023 - 06:51:12 EST


On Mon 2023-03-13 16:33:46, Josh Poimboeuf wrote:
> On Fri, Mar 03, 2023 at 03:00:13PM +0100, Petr Mladek wrote:
> > > MAX_STACK_ENTRIES is 100, which seems excessive. If we halved that, the
> > > array would be "only" 400 bytes, which is *almost* reasonable to
> > > allocate on the stack?
> >
> > It is just for the stack in the process context. Right?
> >
> > I think that I have never seen a stack with over 50 entries. And in
> > the worst case, a bigger amount of entries would "just" result in
> > a non-reliable stack which might be acceptable.
> >
> > It looks acceptable to me.
> >
> > > Alternatively we could have a percpu entries array... :-/
> >
> > That said, percpu entries would be fine as well. It sounds like
> > a good price for the livepatching feature. I think that livepatching
> > is used on big systems anyway.
> >
> > I slightly prefer the per-cpu solution.
>
> Booting a kernel with PREEMPT+LOCKDEP gave me a high-water mark of 60+
> stack entries, seen when probing a device. I decided not to mess with
> MAX_STACK_ENTRIES, and instead just convert the entries to percpu. This
> patch could be inserted at the beginning of the set.

Good to know.

>
> ---8<---
>
> Subject: [PATCH 0.5/3] livepatch: Convert stack entries array to percpu
>
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -240,12 +242,15 @@ static int klp_check_stack_func(struct klp_func *func, unsigned long *entries,
> */
> static int klp_check_stack(struct task_struct *task, const char **oldname)
> {
> - static unsigned long entries[MAX_STACK_ENTRIES];
> + unsigned long *entries = this_cpu_ptr(klp_stack_entries);
> struct klp_object *obj;
> struct klp_func *func;
> int ret, nr_entries;
>
> - ret = stack_trace_save_tsk_reliable(task, entries, ARRAY_SIZE(entries));
> + /* Protect 'klp_stack_entries' */
> + lockdep_assert_preemption_disabled();

I think about adding:

/*
* Stay on the safe side even when cond_resched() is called from
* an IRQ context by mistake.
*/
if (!in_task())
return -EINVAL;

Or is this prevented another way, please?

> +
> + ret = stack_trace_save_tsk_reliable(task, entries, MAX_STACK_ENTRIES);
> if (ret < 0)
> return -EINVAL;
> nr_entries = ret;

Otherwise, it looks good to me.

Best Regards,
Petr