On 3/2/2025 9:26 PM, Bryan O'Donoghue wrote:
On 02/03/2025 11:58, Vedang Nagar wrote:This is not correct. Its not the amount of time which determines the validity of
As mentioned in [1], With the regular firmware, after RX interrupt the data
The basic question : what is the lifetime of the data from RX interrupt to
consumption by another system agent, DSP, userspace, whatever ?
can be considered as valid until next interrupt is raised, but with the rouge
firmware, data can get invalid during the second read and our intention is to
avoid out of bound access read because of such issues.
This is definitely the part I don't compute.
1. RX interrupt
2. Frame#0 Some amount of time data is always valid
the data, its the possibility of rogue firmware which, if incase, puts up the
date in shared queue, would always be invalid, irrespective of time.
3. RX interrupt - new dataIt is a way to prevent any possibility of OOB, similar to how any API does check
4. Frame#1 new data delivered into a buffer
Are you describing a case between RX interrupts 1-3 or a case after 1-4?
Why do we need to write code for rouge firmware anyway ?
for validity of any arguments passed to it, prior to processing.
Below is the report which the reporter reported leading to OOB, let me know if
And the real question - if the data can be invalidated in the 1-3 window above
when is the safe time to snapshot that data ?
We seem to have alot of submissions to deal with 'rouge' firmware without I
think properly describing the problem of the _expected_ data lifetime.
So
a) What is the expected data lifetime of an RX buffer between one
RX IRQ and the next ?
I hope the answer to this is - APSS owns the buffer.
This is BTW usually the case in these types of asymmetric setups
with a flag or some other kind of semaphore that indicates which
side of the data-exchange owns the buffer.
b) In this rouge - buggy - firmware case what is the scope of the
potential race condition ?
What I'd really like to know here is why we have to seemingly
memcpy() again and again in seemingly incongrous and not
immediately obvious places in the code.
Would we not be better advised to do a memcpy() of the entire
RX frame in the RX IRQ handler path if as you appear to me
suggesting - the firmware can "race" with the APSS
i.e. the data-buffer ownership flag either doesn't work
or isn't respected by one side in the data-exchange.
Can we please have a detailed description of the race condition here ?
you are unable to deduce the trail leading to OOB here.
OOB read issue is in function event_seq_changed, please reference below code
snippet:
Buggy code snippet:
static void event_seq_changed(struct venus_core *core, struct venus_inst *inst,
struct hfi_msg_event_notify_pkt *pkt)
...
num_properties_changed = pkt->event_data2; //num_properties_changed is from
message and is not validated.
...
data_ptr = (u8 *)&pkt->ext_event_data[0];
do {
ptype = *((u32 *)data_ptr);
switch (ptype) {
case HFI_PROPERTY_PARAM_FRAME_SIZE:
data_ptr += sizeof(u32);
frame_sz = (struct hfi_framesize *)data_ptr;
event.width = frame_sz->width;
...
}
num_properties_changed--;
} while (num_properties_changed > 0);
```
There is no validation against `num_properties_changed = pkt->event_data2`, so
OOB read occurs.
Go through the reports from the reporter, it was quite evident in leading upto
I don't doubt the new memcpy() makes sense to you but without this detailed
understanding of the underlying problem its virtually impossible to debate the
appropriate remediation - perhaps this patch you've submitted - or some other
solution.
Sorry to dig into my trench here but, way more detail is needed.
[1]: https://lore.kernel.org/lkml/4cfc1fe1-2fab-4256-9ce2-
b4a0aad1069e@xxxxxxxxxx/T/#m5f1737b16e68f8b8fc1d75517356b6566d0ec619
Currently this issue has been discovered by external researchers at this
Why is it in this small specific window that the data can change but not
later ? What is the mechanism the data can change and how do the changes you
propose here address the data lifetime problem ?
point, but if any such OOB issue is discovered at later point as well then we
shall fix them as well.
Right but, I'm looking for a detailed description of the problem.
Can you describe from RX interrupt again what the expected data lifetime of the
RX frame is, which I hope we agree is until the next RX interrupt associated
with a given buffer with an ownership flag shared between firmware and APSS -
and then under what circumstances that "software contract" is being violated.
Also, with rougue firmware we cannot fix the data lifetime problem in my
opinion, but atleast we can fix the out of bound issues.
There is no additional memcpy() now in the v2 patch, but as part of the fix,
Without that context, I don't believe it is really possible to validate an
additional memcpy() here and there in the code as fixing anything.
we are just trying to retain the length of the packet which was being read in
the first memcpy() to avoid the OOB read access.
I can't make a suggestion because - personally speaking I still don't quite
understand the data-race you are describing.
OOB case.
Putting up the sequence for you to go over the interrupt handling and message
queue parsing of the packets from firmware
1.
https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/media/platform/qcom/venus/hfi_venus.c#L1082
2.
https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/media/platform/qcom/venus/hfi_msgs.c#L816
3. event handling (this particular case)
https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/media/platform/qcom/venus/hfi_msgs.c#L658
4.
https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/media/platform/qcom/venus/hfi_msgs.c#L22
the "struct hfi_msg_event_notify_pkt *pkt" pkt here is having the data read from
shared queue.
I get that you say the firmware is breaking the contract but, without more
detail on _how_ it breaks that contract I don't think it's really possible to
validate your fix here, fixes anything.
---
bod
Regards,
Vikash