Re: [PATCH] misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os

From: Greg KH
Date: Fri Sep 16 2022 - 04:16:06 EST


On Fri, Sep 16, 2022 at 03:39:57PM +0800, xmzyshypnc wrote:
> in drivers/misc/sgi-gru/grufile.c, gru_file_unlocked_ioctl function can be called by user. If the req is GRU_SET_CONTEXT_OPTION, it will call gru_set_context_option.

Please properly wrap your changelog text at 72 columns like your editor
asked you to when you wrote the changelog text.

>
> In gru_set_context_option, as req can be controlled by user (copy_from_user(&req, (void __user *)arg, sizeof(req))), we can get into sco_blade_chiplet case and reach gru_check_context_placement function call.
>
> in gru_check_context_placement function, if the error path was steped, say gru_check_chiplet_assignment return 0, then it will fall into gru_unload_context function,which will call gru_free_gru_context->gts_drop. As gts->ts_refcnt was set to 1 in gru_alloc_gts. It will finnaly call kfree(gts) in gts_drop function.
>
> Then gru_unlock_gts will be called in gru_set_context_option function. which is a typical Use after free.
>
> The same problem exists in gru_handle_user_call_os function and gru_fault function.
>
> Fix it by introduce the return value to see if gts is in good case or not. Free the gts in caller when gru_check_chiplet_assignment check failed.
>
> Signed-off-by: xmzyshypnc <1002992920@xxxxxx>
> ---
> drivers/misc/sgi-gru/grufault.c | 14 ++++++++++++--
> drivers/misc/sgi-gru/grumain.c | 19 +++++++++++++++----
> drivers/misc/sgi-gru/grutables.h | 2 +-
> 3 files changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index d7ef61e602ed..08e837a45ad7 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -656,7 +656,13 @@ int gru_handle_user_call_os(unsigned long cb)
> if (ucbnum >= gts->ts_cbr_au_count * GRU_CBR_AU_SIZE)
> goto exit;
>
> - gru_check_context_placement(gts);
> + ret = gru_check_context_placement(gts);
> +
> + if (ret) {

No blank line needed.

> + gru_unlock_gts(gts);
> + gru_unload_context(gts, 1);
> + return -EINVAL;

Jump to an error block at the end of the function instead?

> + }
>
> /*
> * CCH may contain stale data if ts_force_cch_reload is set.
> @@ -874,7 +880,7 @@ int gru_set_context_option(unsigned long arg)
> } else {
> gts->ts_user_blade_id = req.val1;
> gts->ts_user_chiplet_id = req.val0;
> - gru_check_context_placement(gts);
> + ret = gru_check_context_placement(gts);
> }
> break;
> case sco_gseg_owner:
> @@ -889,6 +895,10 @@ int gru_set_context_option(unsigned long arg)
> ret = -EINVAL;
> }
> gru_unlock_gts(gts);
> + if (ret) {
> + gru_unload_context(gts, 1);
> + ret = -EINVAL;
> + }
>
> return ret;
> }
> diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
> index 9afda47efbf2..e1ecf86df3c1 100644
> --- a/drivers/misc/sgi-gru/grumain.c
> +++ b/drivers/misc/sgi-gru/grumain.c
> @@ -716,9 +716,10 @@ static int gru_check_chiplet_assignment(struct gru_state *gru,
> * chiplet. Misassignment can occur if the process migrates to a different
> * blade or if the user changes the selected blade/chiplet.
> */
> -void gru_check_context_placement(struct gru_thread_state *gts)
> +int gru_check_context_placement(struct gru_thread_state *gts)
> {
> struct gru_state *gru;
> + int ret = 0;
>
> /*
> * If the current task is the context owner, verify that the
> @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> */
> gru = gts->ts_gru;
> if (!gru || gts->ts_tgid_owner != current->tgid)
> - return;
> + return ret;

Why is this succeeding if there was an error?

>
> if (!gru_check_chiplet_assignment(gru, gts)) {
> STAT(check_context_unload);
> - gru_unload_context(gts, 1);
> + ret = 1;

1 is not a valid error value;


> } else if (gru_retarget_intr(gts)) {
> STAT(check_context_retarget_intr);
> }
> +
> + return ret;
> }
>
>
> @@ -919,6 +922,7 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
> struct gru_thread_state *gts;
> unsigned long paddr, vaddr;
> unsigned long expires;
> + int ret;
>
> vaddr = vmf->address;
> gru_dbg(grudev, "vma %p, vaddr 0x%lx (0x%lx)\n",
> @@ -934,7 +938,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
> mutex_lock(&gts->ts_ctxlock);
> preempt_disable();
>
> - gru_check_context_placement(gts);
> + ret = gru_check_context_placement(gts);
> + if (ret) {
> + mutex_unlock(&gts->ts_ctxlock);
> + gru_unload_context(gts, 1);
> + return VM_FAULT_NOPAGE;

Why not return ret?

> + }
>
> if (!gts->ts_gru) {
> STAT(load_user_context);
> @@ -958,6 +967,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
> preempt_enable();
> mutex_unlock(&gts->ts_ctxlock);
>
> +
> +

Why the blank lines added?

thanks,

greg k-h