Re: [PATCH] xen-swiotlb: exchange memory with Xen only when pages are contiguous

From: Joe Jin
Date: Thu Oct 25 2018 - 14:56:17 EST


Hi all,

I just discussed this patch with Boris in private, his opinions(Boris,
please correct me if any misunderstood) are:

1. With/without the check, both are incorrect, he thought we need to
prevented unalloc'd free at here.
2. On freeing, if upper layer already checked the memory was DMA-able,
the checking at here does not make sense, we can remove all checks.
3. xen_create_contiguous_region() and xen_destroy_contiguous_region()
to come in pairs.

For #1 and #3, I think we need something associate it, like a list, on
allocating, add addr to it, on freeing, check if in the list.

For #2, I'm was not found anywhere validated the address on
dma_free_coherent() callpath, not just xen-swiotlb.

>From my side, I think the checks are make sense, it prevented to exchange
non-contiguous memory with Xen also make sure Xen has enough DMA memory
for DMA also for guest creation. I'm not sure if we can merge this patch
to avoid exchanged non-contiguous memory with Xen?

Any input will appreciate.

Thanks,
Joe

On 10/25/18 9:28 AM, Joe Jin wrote:
> On 10/25/18 9:10 AM, Boris Ostrovsky wrote:
>> On 10/25/18 10:23 AM, Joe Jin wrote:
>>> On 10/25/18 4:45 AM, Boris Ostrovsky wrote:
>>>> On 10/24/18 10:43 AM, Joe Jin wrote:
>>>>> On 10/24/18 6:57 AM, Boris Ostrovsky wrote:
>>>>>> On 10/24/18 9:02 AM, Konrad Rzeszutek Wilk wrote:
>>>>>>> On Tue, Oct 23, 2018 at 08:09:04PM -0700, Joe Jin wrote:
>>>>>>>> Commit 4855c92dbb7 "xen-swiotlb: fix the check condition for
>>>>>>>> xen_swiotlb_free_coherent" only fixed memory address check condition
>>>>>>>> on xen_swiotlb_free_coherent(), when memory was not physically
>>>>>>>> contiguous and tried to exchanged with Xen via
>>>>>>>> xen_destroy_contiguous_region it will lead kernel panic.
>>>>>>> s/it will lead/which lead to/?
>>>>>>>
>>>>>>>> The correct check condition should be memory is in DMA area and
>>>>>>>> physically contiguous.
>>>>>>> "The correct check condition to make Xen hypercall to revert the
>>>>>>> memory back from its 32-bit pool is if it is:
>>>>>>> 1) Above its DMA bit mask (for example 32-bit devices can only address
>>>>>>> up to 4GB, and we may want 4GB+2K), and
>>>>>> Is this "and' or 'or'?
>>>>>>
>>>>>>> 2) If it not physically contingous
>>>>>>>
>>>>>>> N.B. The logic in the code is inverted, which leads to all sorts of
>>>>>>> confusions."
>>>>>> I would, in fact, suggest to make the logic the same in both
>>>>>> xen_swiotlb_alloc_coherent() and xen_swiotlb_free_coherent() to avoid
>>>>>> this. This will involve swapping if and else in the former.
>>>>>>
>>>>>>
>>>>>>> Does that sound correct?
>>>>>>>
>>>>>>>> Thank you Boris for pointing it out.
>>>>>>>>
>>>>>>> Fixes: 4855c92dbb7 ("xen-sw..") ?
>>>>>>>
>>>>>>>> Signed-off-by: Joe Jin <joe.jin@xxxxxxxxxx>
>>>>>>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>>>>>>>> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
>>>>>>> Reported-by: Boris Ostrovs... ?
>>>>>>>> Cc: Christoph Helwig <hch@xxxxxx>
>>>>>>>> Cc: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
>>>>>>>> Cc: John Sobecki <john.sobecki@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> drivers/xen/swiotlb-xen.c | 4 ++--
>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
>>>>>>>> index f5c1af4ce9ab..aed92fa019f9 100644
>>>>>>>> --- a/drivers/xen/swiotlb-xen.c
>>>>>>>> +++ b/drivers/xen/swiotlb-xen.c
>>>>>>>> @@ -357,8 +357,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
>>>>>>>> /* Convert the size to actually allocated. */
>>>>>>>> size = 1UL << (order + XEN_PAGE_SHIFT);
>>>>>>>>
>>>>>>>> - if (((dev_addr + size - 1 <= dma_mask)) ||
>>>>>>>> - range_straddles_page_boundary(phys, size))
>>>>>>>> + if ((dev_addr + size - 1 <= dma_mask) &&
>>>>>>>> + !range_straddles_page_boundary(phys, size))
>>>>>>>> xen_destroy_contiguous_region(phys, order);
>>>>>> I don't think this is right.
>>>>>>
>>>>>> if ((dev_addr + size - 1 > dma_mask) || range_straddles_page_boundary(phys, size))
>>>>>>
>>>>>> No?
>>>>> No this is not correct.
>>>>>
>>>>> When allocate memory, it tried to allocated from Dom0/Guest, then check if physical
>>>>> address is DMA memory also contiguous, if no, exchange with Hypervisor, code as below:
>>>>>
>>>>> 326 phys = *dma_handle;
>>>>> 327 dev_addr = xen_phys_to_bus(phys);
>>>>> 328 if (((dev_addr + size - 1 <= dma_mask)) &&
>>>>> 329 !range_straddles_page_boundary(phys, size))
>>>>> 330 *dma_handle = dev_addr;
>>>>> 331 else {
>>>>> 332 if (xen_create_contiguous_region(phys, order,
>>>>> 333 fls64(dma_mask), dma_handle) != 0) {
>>>>> 334 xen_free_coherent_pages(hwdev, size, ret, (dma_addr_t)phys, attrs);
>>>>> 335 return NULL;
>>>>> 336 }
>>>>> 337 }
>>>>>
>>>>>
>>>>> On freeing, need to return the memory to Xen, otherwise DMA memory will be used
>>>>> up(this is the issue the patch intend to fix), so when memory is DMAable and
>>>>> contiguous then call xen_destroy_contiguous_region(), return DMA memory to Xen.
>>>> So if you want to allocate 1 byte at address 0 (and dev_addr=phys),
>>>> xen_create_contiguous_region() will not be called. And yet you will call
>>>> xen_destroy_contiguous_region() in the free path.
>>>>
>>>> Is this the expected behavior?
>>> I could not say it's expected behavior, but I think it's reasonable.
>>
>> I would expect xen_create_contiguous_region() and
>> xen_destroy_contiguous_region() to come in pairs. If a region is
>> created, it needs to be destroyed. And vice versa.
>>
>>
>>>
>>> On allocating, it used __get_free_pages() to allocate memory, if lucky the memory is
>>> DMAable, will not exchange memory with hypervisor, obviously this is not guaranteed.
>>>
>>> And on freeing it could not be identified if memory from Dom0/guest own memory
>>> or hypervisor
>>
>>
>> I think it can be. if (!(dev_addr + size - 1 <= dma_mask) ||
>> range_straddles_page_boundary()) then it must have come from the
>> hypervisor, because that's the check we make in
>> xen_swiotlb_alloc_coherent().
>
> This is not true.
>
> dev_addr was came from dma_handle, *dma_handle will be changed after called
> xen_create_contiguous_region():
>
> 2590 int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
> 2591 unsigned int address_bits,
> 2592 dma_addr_t *dma_handle)
> 2593 {
> ......
> 2617 success = xen_exchange_memory(1UL << order, 0, in_frames,
> 2618 1, order, &out_frame,
> 2619 address_bits);
> 2620
> 2621 /* 3. Map the new extent in place of old pages. */
> 2622 if (success)
> 2623 xen_remap_exchanged_ptes(vstart, order, NULL, out_frame);
> 2624 else
> 2625 xen_remap_exchanged_ptes(vstart, order, in_frames, 0);
> 2626
> 2627 spin_unlock_irqrestore(&xen_reservation_lock, flags);
> 2628
> 2629 *dma_handle = virt_to_machine(vstart).maddr;
> 2630 return success ? 0 : -ENOMEM;
> 2631 }
>
>
> So means dev_addr check on xen_swiotlb_alloc_coherent() is not same one on
> xen_swiotlb_free_coherent().
>
> Thanks,
> Joe
>
>
>>
>>
>> -boris
>>
>>
>>> , if don't back memory to hypervisor which will lead hypervisor DMA
>>> memory be used up, then on Dom0/guest, DMA request maybe failed, the worse thing is
>>> could not start any new guest.
>>>
>>> Thanks,
>>> Joe
>>>
>>>> -boris
>>>>
>>


--
Oracle <http://www.oracle.com>
Joe Jin | Software Development Director
ORACLE | Linux and Virtualization
500 Oracle Parkway Redwood City, CA US 94065