Re: [PATCH] virt: acrn: fix invalid check past list iterator

From: Jakob Koschel
Date: Fri Apr 01 2022 - 03:16:59 EST




> On 1. Apr 2022, at 05:57, Li Fei1 <fei1.li@xxxxxxxxx> wrote:
>
> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
>>
>>> On 1. Apr 2022, at 03:15, Li Fei1 <fei1.li@xxxxxxxxx> wrote:
>>>
>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>>>>
>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <fei1.li@xxxxxxxxx> wrote:
>>>>>
>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>>>> does not point to a valid element because no break was hit.
>>>>>>
>>>>>> To only execute the dev_warn if actually a break within the loop was
>>>>>> hit, a separate variable is used that is only set if it is ensured to
>>>>>> point to a valid client struct.
>>>>>>
>>>>> Hi Koschel
>>>>>
>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>>>> The dev_warn should only been called when has_pending = true && retry == 0
>>>>
>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
>>>> the dev_warn() with has_pending = false && retry == 0.
>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
>>> there're other pending client on the ioreq_clients (because we can't make sure
>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>>
>> I'm sorry, I don't quite understand what you mean by that.
>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>> or not?
>>
> Yes, so what ? It just a hint there may have pending request.

if has_pending == false && retry == 0 when calling dev_warn() there are very clear
dependencies met:

* has_pending == false means that the list_for_each_entry() macro it *not* exit early.
* since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
* using client->name is not safe and will not point to a valid string (perhaps not even an address)

I'm *only* talking about the case where has_pending == false, in case that's not clear.


> Even retry == 0 && has_pending = true,
> When we call dev_warn, the clent may not is pending.
>
>
>>>>
>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>> has_pending = has_pending_request(client);
>>>>> if (has_pending)
>>>>> }
>>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
>>>>
>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
>>> What do you mean "client will not hold a valid list entry" ?
>>
>> Imagine a very simple example:
>>
>> struct acrn_ioreq_client *client;
>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>> continue;
>> }
>>
>> dev_warn(acrn_dev.this_device,
>> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
>>
> If there're pending request, we would call schedule to schedule out then schedule back
> to check the list from the beginning. If there's no pending client, we point to the last
> client and break out the while loop.
>
> The code doesn't want to find the pending client and break out the while loop and call
> dev_warn. Please see the function comment.
>
>
>>
>> Since there is no break for the list_for_each_entry() iterator to return early,
>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
>> Essentially before the terminating condition of the list traversal the following code is called:
>>
>> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>
>> resulting in a:
>>
>> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>
>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
>> this call compute an invalid pointer.
>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
>> not safe. Since the loop can never return early in the simple example above it will
>> always break. On other cases (the one we are discussing here) there might be a chance that
>> there is one code path (in theory) where the loop did not exit early and 'client'
>> holds that 'invalid entry'.
>>
>> This would be the case with has_pending = false && retry == 0.
>>
>> I hope this makes sense.
>>
>>>
>>>>
>>>>>
>>>>> if (has_pending)
>>>>> schedule_timeout_interruptible(HZ / 100);
>>>>> } while (has_pending && --retry > 0);
>>>>
>>>> since has_pending && --retry > 0 is no longer true the loop stops.
>>>>
>>>>> if (retry == 0)
>>>>> dev_warn(acrn_dev.this_device,
>>>>> "%s cannot flush pending request!\n", client->name);
>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>>>> in a type confusion.
>>>>
>>>>>
>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>>>> to schedule out to wait all the pending I/O requests would been completed.
>>>>>
>>>>> Thanks.
>>>>
>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>>>> any use of the list iterator after the loop to make such potentially issues detectable
>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
>>
>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
>> the do while loop.
>>
>> The only thing we know after the do while loop is that: !has_pending || retry == 0
>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>>
>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>>>> to avoid the type confusion.
>>>>
>>>> Thanks,
>>>> Jakob
>>
>> Jakob

Jakob