Re: [RFC PATCH 08/13] x86/process/64: Clean up uintr task fork and exit paths

From: Thomas Gleixner
Date: Thu Sep 23 2021 - 21:02:26 EST

Next message: Guenter Roeck: "Re: [PATCH][next] hwmon: (mlxreg-fan): Fix out of bounds read on array fan->pwm"
Previous message: Kees Cook: "/proc/$pid/chan kernel address exposures (was Re: [proc/wchan] 30a3a19273: leaking-addresses.proc.wchan./proc/bus/input/devices:B:KEY=1000000000007ff980000000007fffebeffdfffeffffffffffffffffffffe)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Sep 13 2021 at 13:01, Sohil Mehta wrote:

> The user interrupt MSRs and the user interrupt state is task specific.
> During task fork and exit clear the task state, clear the MSRs and
> dereference the shared resources.
>
> Some of the memory resources like the UPID are referenced in the file
> descriptor and could be in use while the uintr_fd is still valid.
> Instead of freeing up the UPID just dereference it.

Derefencing the UPID, i.e. accessing task->upid->foo helps in which way?

You want to drop the reference count I assume. Then please write that
so.

> Eventually when every user releases the reference the memory resource
> will be freed up.

Yeah, eventually or not...

> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c

> @@ -260,6 +260,7 @@ int fpu_clone(struct task_struct *dst)
> {
> struct fpu *src_fpu = &current->thread.fpu;
> struct fpu *dst_fpu = &dst->thread.fpu;
> + struct uintr_state *uintr_state;
>
> /* The new task's FPU state cannot be valid in the hardware. */
> dst_fpu->last_cpu = -1;
> @@ -284,6 +285,14 @@ int fpu_clone(struct task_struct *dst)
>
> else
> save_fpregs_to_fpstate(dst_fpu);
> +
> + /* UINTR state is not expected to be inherited (in the current design). */
> + if (static_cpu_has(X86_FEATURE_UINTR)) {
> + uintr_state = get_xsave_addr(&dst_fpu->state.xsave, XFEATURE_UINTR);
> + if (uintr_state)
> + memset(uintr_state, 0, sizeof(*uintr_state));
> + }

1) If the FPU registers are up to date then this can be completely
avoided by excluding the UINTR component from XSAVES

2) If the task never used that muck then UINTR is in init state and
clearing that memory is a redunant exercise because it has been
cleared already

So yes, this clearly is evidence how this is enhancing performance.

> +/*
> + * This should only be called from exit_thread().

Should? Would? Maybe or what?

> + * exit_thread() can happen in current context when the current thread is
> + * exiting or it can happen for a new thread that is being created.

A right that makes sense. If a new thread is created then it can call
exit_thread(), right?

> + * For new threads is_uintr_receiver() should fail.

Should fail?

> + */
> +void uintr_free(struct task_struct *t)
> +{
> + struct uintr_receiver *ui_recv;
> + struct fpu *fpu;
> +
> + if (!static_cpu_has(X86_FEATURE_UINTR) || !is_uintr_receiver(t))
> + return;
> +
> + if (WARN_ON_ONCE(t != current))
> + return;
> +
> + fpu = &t->thread.fpu;
> +
> + fpregs_lock();
> +
> + if (fpregs_state_valid(fpu, smp_processor_id())) {
> + wrmsrl(MSR_IA32_UINTR_MISC, 0ULL);
> + wrmsrl(MSR_IA32_UINTR_PD, 0ULL);
> + wrmsrl(MSR_IA32_UINTR_RR, 0ULL);
> + wrmsrl(MSR_IA32_UINTR_STACKADJUST, 0ULL);
> + wrmsrl(MSR_IA32_UINTR_HANDLER, 0ULL);
> + } else {
> + struct uintr_state *p;
> +
> + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR);
> + if (p) {
> + p->handler = 0;
> + p->uirr = 0;
> + p->upid_addr = 0;
> + p->stack_adjust = 0;
> + p->uinv = 0;
> + }
> + }
> +
> + /* Check: Can a thread be context switched while it is exiting? */

This looks like a question which should be answered _before_ writing
such code.

> + ui_recv = t->thread.ui_recv;
> +
> + /*
> + * Suppress notifications so that no further interrupts are
> + * generated based on this UPID.
> + */
> + set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status);
> + put_upid_ref(ui_recv->upid_ctx);
> + kfree(ui_recv);
> + t->thread.ui_recv = NULL;

Again, why needs all this put/kfree muck be within the fpregs locked section?

> + fpregs_unlock();
> +}

Thanks,

tglx

Next message: Guenter Roeck: "Re: [PATCH][next] hwmon: (mlxreg-fan): Fix out of bounds read on array fan->pwm"
Previous message: Kees Cook: "/proc/$pid/chan kernel address exposures (was Re: [proc/wchan] 30a3a19273: leaking-addresses.proc.wchan./proc/bus/input/devices:B:KEY=1000000000007ff980000000007fffebeffdfffeffffffffffffffffffffe)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]