Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

From: Zheng Hacker
Date: Tue Mar 07 2023 - 23:33:10 EST


Hi Kyrie,

Thanks for your quick response and detailed information. I understand
more about the design with your explanation. As you say, there
shouldn't be any issues with the normal flow. And if the attacker use
some tricks to enhance the race window and call remove function
directly as [1] mentioned, this might trigger uaf bug unexpectedly.

Besides, I found another code path which may call v4l2_m2m_cancel_job
and finally make the situation looks safe. The invoking chain is

mtk_jpeg_release
v4l2_m2m_ctx_release
v4l2_m2m_cancel_job

If the attacker have to call mtk_jpeg_release function before the
remove function, it might be harmless. But I'm not sure about that.

[1] https://www.usenix.org/system/files/sec21-lee-yoochan.pdf

Best regards,
Zheng

Kyrie Wu (吴晗) <Kyrie.Wu@xxxxxxxxxxxx> 于2023年3月8日周三 11:32写道:
>
> On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > Hi Kyrie,
> >
> > Thank you for your careful analysis and response. I still have some
> > areas that I don't quite understand and would like to ask for
> > clarification. That is, how do the function pointers for stop
> > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I
> > would greatly appreciate your response.
> >
> > Best regards,
> > Zheng
>
> Dear zheng,
>
> For stop streaming, what I mean is that stoppping jpeg decoding or
> encoding.
> Ok, let me introduce the sw flow of stop streaming:
> Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will call
> v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is
> running), it will wait event, the event is wake up by
> v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by jpeg
> dec/enc irq handler, which means that the waitting would result mtk hw
> to finish dec/enc, irq will occur and irq handler would cancel timeout
> worker. The follow is shown as blow.
> v4l2_m2m_ioctl_streamoff
> v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec_irq
> wait evnet <------ wake up ------v4l2_m2m_job_finish
> cancel timeout work
>
> As mentioned above, if it is normal stop streaming action, there will
> be no happen that the timeout worker does not canceled.
>
> But if mtk_jpeg_remove is called directly without above flow, it would
> cause lots of issues.
>
> Regards,
> Kyrie.
> >
> > Kyrie Wu (吴晗) <Kyrie.Wu@xxxxxxxxxxxx> 于2023年3月8日周三 10:02写道:
> > >
> > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > The timer function was set in mtk_jpeg_probe with
> > > > mtk_jpeg_job_timeout_work function.
> > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq)
> > > > which
> > > > may cancel the worker.
> > > > They are used as IRQ handler function which is saved as function
> > > > pointer in a variable.
> > > > In mtk_jpeg_probe, they are registered by devm_request_irq:
> > > >
> > > > ret = devm_request_irq(&pdev->dev,
> > > > jpeg_irq,
> > > > jpeg->variant->irq_handler,
> > > > 0,
> > > > pdev->name, jpeg);
> > > > if (ret) {
> > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
> > > > jpeg_irq, ret);
> > > > return ret;
> > > > }
> > > >
> > > > However, if we remove the module without triggering the irq, the
> > > > worker will never be removed.
> > > >
> > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > mtk_jpeg_enc_device_run will start the worker.
> > > > The schedule invoking is quite complicated. As far as I know, the
> > > > invoking chain is as follows:
> > > >
> > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> > > >
> > > > the v4l2_m2m_device_run_work function is also a worker which is
> > > > set
> > > > in
> > > > v4l2_m2m_init and started in
> > > > v4l2_m2m_schedule_next_job.
> > > >
> > > > Before calling remove function, the mtk_jpeg_release was invoked
> > > > to
> > > > release the related resource.
> > > >
> > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > >
> > > > But this will only cancel the current queue by
> > > > list_del(&m2m_dev->curr_ctx->queue);
> > > >
> > > > I think this can not cancel the posted task mentioned before. So
> > > > I
> > > > think if mtk_jpeg_job_timeout_work
> > > >
> > > > is working on, and using jpeg->m2m_dev after freeing it in
> > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > >
> > > > static int mtk_jpeg_release(struct file *file)
> > > > {
> > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > >private_data);
> > > >
> > > > mutex_lock(&jpeg->lock);
> > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > v4l2_fh_del(&ctx->fh);
> > > > v4l2_fh_exit(&ctx->fh);
> > > > kfree(ctx);
> > > > mutex_unlock(&jpeg->lock);
> > > > return 0;
> > > > }
> > > >
> > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > {
> > > > /* wait until the current context is dequeued from job_queue */
> > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > >
> > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > >
> > > > kfree(m2m_ctx);
> > > > }
> > > >
> > > > Note that all of this is static analysis, which may be false
> > > > positive.
> > > > Feel free to tell me if there is something I've missed.
> > > >
> > > > Regard,
> > > > Zheng
> > >
> > > Dear Zheng,
> > >
> > > You set up an application scenario:
> > > cpu1 is using the mtk-jpeg driver and timeout work has been
> > > scheduled.
> > > At the same time cpu0 wanted to remove the mtk-jpeg driver, which
> > > caused the UAF bug.
> > > I wonder if such an irrational application scenario could exist.
> > > This
> > > scenario, as you described, not only leads to the problems you
> > > mentioned, but also to output&capture memory leaks and unreleased
> > > resources, such as spinlock.
> > > Typically, if we want to remove the driver, we firstly do stop
> > > streaming, which ensures that the worker has been canceled.
> > > I agree with your changes from the perspective of strengthening the
> > > robustness of the driver code.
> > >
> > > Regards,
> > > Kyrie.
> > > >
> > > > Irui Wang (王瑞) <Irui.Wang@xxxxxxxxxxxx> 于2023年3月7日周二 18:23写道:
> > > > >
> > > > > Dear Angelo and Zheng,
> > > > >
> > > > > Thanks for your patch and comments.
> > > > >
> > > > > Dear Kyrie,
> > > > >
> > > > > Please help to check this, thanks.
> > > > >
> > > > > Best Regards
> > > > >
> > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno
> > > > > wrote:
> > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Is there anyone who can help with this? I can provide more
> > > > > > > details
> > > > > > > like invoking chain if needed.
> > > > > > >
> > > > > >
> > > > > > Providing more details is always good. Please do.
> > > > > >
> > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
> > > > > >
> > > > > > Regards,
> > > > > > Angelo
> > > > > >
> > > > > > > Thanks,
> > > > > > > Zheng
> > > > > > >
> > > > > > > Zheng Wang <zyytlz.wz@xxxxxxx> 于2023年3月6日周一 14:28写道:
> > > > > > > >
> > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > > > > > > and mtk_jpeg_enc_device_run may be called to start the
> > > > > > > > work.
> > > > > > > > If we remove the module which will call mtk_jpeg_remove
> > > > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > > > possible sequence is as follows, which will cause a
> > > > > > > > typical UAF bug.
> > > > > > > >
> > > > > > > > Fix it by canceling the work before cleanup in the
> > > > > > > > mtk_jpeg_remove
> > > > > > > >
> > > > > > > > CPU0 CPU1
> > > > > > > >
> > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > mtk_jpeg_remove |
> > > > > > > > v4l2_m2m_release |
> > > > > > > > kfree(m2m_dev); |
> > > > > > > > |
> > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > | m2m_dev->curr_ctx //use
> > > > > > > >
> > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@xxxxxxx>
> > > > > > > > ---
> > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c |
> > > > > > > > 2 +-
> > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git
> > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > ---
> > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > +++
> > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > > > > > > platform_device *pdev)
> > > > > > > > static int mtk_jpeg_remove(struct platform_device
> > > > > > > > *pdev)
> > > > > > > > {
> > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > -
> > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >