Re: [RFC PATCH] media: venus: Fix NULL pointer dereference in core selection

From: Stanimir Varbanov
Date: Mon Jun 22 2020 - 07:51:53 EST


Hi Doug,

Thanks for the fix and sorry for the late reply.

On 6/2/20 6:39 AM, Doug Anderson wrote:
> Hi,
>
> On Mon, Jun 1, 2020 at 3:03 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>>
>> The newly-introduced function min_loaded_core() iterates over all of
>> the venus instances an tries to figure out how much load each instance
>> is putting on each core. Not all instances, however, might be fully
>> initialized. Specifically the "codec_freq_data" is initialized as
>> part of vdec_queue_setup(), but an instance may already be in the list
>> of all instances before that time.
>>
>> Let's band-aid this by checking to see if codec_freq_data is NULL
>> before dereferencing.
>>
>> NOTE: without this fix I was running into a crash. Specifically there
>> were two venus instances. One was doing start_streaming. The other
>> was midway through queue_setup but hadn't yet gotten to initting
>> "codec_freq_data".
>>
>> Fixes: eff82f79c562 ("media: venus: introduce core selection")
>> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
>> ---
>> I'm not massively happy about this commit but it's the best I could
>> come up with without being much more of an expert in the venus codec.
>> If someone has a better patch then please just consider this one to be
>> a bug report and feel free to submit a better fix! :-)
>>
>> In general I wonder a little bit about whether it's safe to be peeking
>> at all the instances without grabbing the "inst->lock" on each one. I
>> guess it is since we do it both here and in load_scale_v4() but I
>> don't know why.
>>
>> One thought I had was that we could fully avoid accessing the other
>> instances, at least in min_loaded_core(), by just keeping track of
>> "core1_load" and "core2_load" in "struct venus_core". Whenever we add
>> a new instance we could add to the relevant variables and whenever we
>> release an instance we could remove. Such a change seems cleaner but
>> would require someone to test to make sure we didn't miss any case
>> (AKA we always properly added/removed our load from the globals).

Thanks for the suggestion (I also thought about something similar). I
will try to cook something.

>>
>> drivers/media/platform/qcom/venus/pm_helpers.c | 2 ++
>> 1 file changed, 2 insertions(+)
>
> This fixes the same crash as the patch:
>
> https://lore.kernel.org/r/1588314480-22409-1-git-send-email-mansur@xxxxxxxxxxxxxx
>

I'm going to take this approach because it takes into account the state
of the instance. The instance could be opened/created but the streaming
could not be started in near future, so it shouldn't be correct to take
its load when doing the calculations.

--
regards,
Stan