Re: [PATCH] venus: avoid extra locking in driver

From: mansur
Date: Fri May 01 2020 - 03:09:57 EST


On 2020-03-11 08:34, Alexandre Courbot wrote:
On Tue, Mar 10, 2020 at 7:07 AM Jeffrey Kardatzke <jkardatzke@xxxxxxxxxx> wrote:

On Thu, Mar 5, 2020 at 11:50 PM Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote:
>
> On Fri, Mar 6, 2020 at 2:34 PM Mansur Alisha Shaik
> <mansur@xxxxxxxxxxxxxx> wrote:
> >
> > This change will avoid extra locking in driver.
>
> Could you elaborate a bit more on the problem that this patch solves?

For us it fixes a kernel null deref that happens when we run the
MultipleEncoders test (I've verified this to be true).

>
> >
> > Signed-off-by: Mansur Alisha Shaik <mansur@xxxxxxxxxxxxxx>
> > ---
> > drivers/media/platform/qcom/venus/core.c | 2 +-
> > drivers/media/platform/qcom/venus/core.h | 2 +-
> > drivers/media/platform/qcom/venus/helpers.c | 11 +++++++++--
> > drivers/media/platform/qcom/venus/pm_helpers.c | 8 ++++----
> > 4 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
> > index 194b10b9..75d38b8 100644
> > --- a/drivers/media/platform/qcom/venus/core.c
> > +++ b/drivers/media/platform/qcom/venus/core.c
> > @@ -447,7 +447,7 @@ static const struct freq_tbl sdm845_freq_table[] = {
> > { 244800, 100000000 }, /* 1920x1080@30 */
> > };
> >
> > -static struct codec_freq_data sdm845_codec_freq_data[] = {
> > +static const struct codec_freq_data sdm845_codec_freq_data[] = {
> > { V4L2_PIX_FMT_H264, VIDC_SESSION_TYPE_ENC, 675, 10 },
> > { V4L2_PIX_FMT_HEVC, VIDC_SESSION_TYPE_ENC, 675, 10 },
> > { V4L2_PIX_FMT_VP8, VIDC_SESSION_TYPE_ENC, 675, 10 },
> > diff --git a/drivers/media/platform/qcom/venus/core.h b/drivers/media/platform/qcom/venus/core.h
> > index ab7c360..8c8d0e9 100644
> > --- a/drivers/media/platform/qcom/venus/core.h
> > +++ b/drivers/media/platform/qcom/venus/core.h
> > @@ -245,7 +245,7 @@ struct venus_buffer {
> > struct clock_data {
> > u32 core_id;
> > unsigned long freq;
> > - const struct codec_freq_data *codec_freq_data;
> > + struct codec_freq_data codec_freq_data;
> > };
> >
> > #define to_venus_buffer(ptr) container_of(ptr, struct venus_buffer, vb)
> > diff --git a/drivers/media/platform/qcom/venus/helpers.c b/drivers/media/platform/qcom/venus/helpers.c
> > index bcc6038..550c4ff 100644
> > --- a/drivers/media/platform/qcom/venus/helpers.c
> > +++ b/drivers/media/platform/qcom/venus/helpers.c
> > @@ -807,6 +807,7 @@ int venus_helper_init_codec_freq_data(struct venus_inst *inst)
> > unsigned int i, data_size;
> > u32 pixfmt;
> > int ret = 0;
> > + bool found = false;
> >
> > if (!IS_V4(inst->core))
> > return 0;
> > @@ -816,16 +817,22 @@ int venus_helper_init_codec_freq_data(struct venus_inst *inst)
> > pixfmt = inst->session_type == VIDC_SESSION_TYPE_DEC ?
> > inst->fmt_out->pixfmt : inst->fmt_cap->pixfmt;
> >
> > + memset(&inst->clk_data.codec_freq_data, 0,
> > + sizeof(inst->clk_data.codec_freq_data));
> > +
> > for (i = 0; i < data_size; i++) {
> > if (data[i].pixfmt == pixfmt &&
> > data[i].session_type == inst->session_type) {
> > - inst->clk_data.codec_freq_data = &data[i];
> > + inst->clk_data.codec_freq_data = data[i];
>
> From the patch I'd infer that inst->clk_data.codec_freq_data needs to
> change at runtime. Is this what happens? Why? I'd expect that
> frequency tables remain constant, and thus that the global
> sdm845_codec_freq_data can remain constant while
> clock_data::codec_freq_data is a const reference to it. What prevents
> this from happening?
>
> > + found = true;
> > break;
> > }
> > }
> >
> > - if (!inst->clk_data.codec_freq_data)
> > + if (!found) {
> > + dev_err(inst->core->dev, "cannot find codec freq data\n");
> > ret = -EINVAL;
> > + }
> >
> > return ret;
> > }
> > diff --git a/drivers/media/platform/qcom/venus/pm_helpers.c b/drivers/media/platform/qcom/venus/pm_helpers.c
> > index abf9315..240845e 100644
> > --- a/drivers/media/platform/qcom/venus/pm_helpers.c
> > +++ b/drivers/media/platform/qcom/venus/pm_helpers.c
> > @@ -496,7 +496,7 @@ min_loaded_core(struct venus_inst *inst, u32 *min_coreid, u32 *min_load)
> > list_for_each_entry(inst_pos, &core->instances, list) {
> > if (inst_pos == inst)
> > continue;
> > - vpp_freq = inst_pos->clk_data.codec_freq_data->vpp_freq;
> > + vpp_freq = inst_pos->clk_data.codec_freq_data.vpp_freq;

This is the main thing it fixes (this is where the null deref occurs).
If there's multiple instances in use and the other instance hasn't
populated the codec_freq_data pointer then we'll hit a null deref
here.

Couldn't this be fixed by checking the pointer for NULL here or
(probably better) populating codec_freq_data earlier so that it is
always valid?

This can be fix by checking for NULL pointer as well, this looks like masking the actual issue.
Currently we are considering the instances which are available
in core->inst list for load calculation in min_loaded_core()
function, but this is incorrect because by the time we call
decide_core() for second instance, the third instance not
filled yet codec_freq_data pointer.

So we consider the instances for load calculation for those session has started.
and uploaded V2 version https://lore.kernel.org/patchwork/patch/1234136/ for the same.

This fix looks like it is replacing a NULL pointer dereference with
access to data initialized to fallback values (which may or may not be
meaningful), and I don't see the need to copy what is effectively
constant data into each instance.