Re: [PATCH v2] Revert "swiotlb: panic if nslabs is too small"

From: Dongli Zhang
Date: Wed Aug 31 2022 - 22:18:57 EST


Hi Yu,

On 8/31/22 5:24 PM, Yu Zhao wrote:
> On Wed, Aug 31, 2022 at 4:20 PM Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
>>
>> Hi Yu,
>>
>> As we discussed in the past, the swiotlb panic on purpose
>
> We should not panic() at all, especially on a platform that has been
> working well since at least 4.14.
>
> Did you check out this link I previously shared with you?
> https://urldefense.com/v3/__https://lore.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld*pnerc4J2Ag990WwAA@xxxxxxxxxxxxxx/__;Kw!!ACWV5N9M2RV99hQ!PXzSLurBv7VqxI1451TV4zO3_-BYj4grk-HYBsXzSnA6nZcXaBzdsQ-rF2DAqlICSRPMt-efYv_Uu2A2CQ$

Thanks for sharing! I used to read that in the past. To be honest I am still
confused on when to use BUG/panic and when to not, as I still see many usage in
some patches.

Just about swiotlb, it may panic in many cases during boot, e.g.,:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1955655

>
>> because the
>> mips/cavium-octeon/dma-octeon.c requests to allocate only PAGE_SIZE swiotlb
>> buffer.
>
> What's wrong with PAGE_SIZE swiotlb?

The swiotlb is like a software IOMMU. When the device supports only <=32-bit DMA
while the IOMMU is not available, it may do the below for each DMA operation:

1. The swiotlb inits and pre-allocates many 32-bit memory during boot.
2. For a DMA read/write, if the page is 64-bit but 32-bit DMA is required, the
kernel swaps the 64-bit DMA page with the pre-allocated swiotlb 32-bit memory.
3. After the DMA is complete, the 32-bit memory is swapped back to swiotlb.

The size of pre-allocated swiotlb is fixed during boot. The DMA will fail later
if the pre-allocated swiotlb is not enough.

That's why I think the objective to set swiotlb=PAGE_SIZE is just to minimize
the swiotlb pre-allocation when it is not required. I do not see a reason to
pre-allocate a PAGE_SIZE for DMA.

There was a discussion between Robin and Christoph whether it is a good idea to
disable swiotlb by allocating a small chunk of memory.

https://lore.kernel.org/all/d5016c1e-55d9-4224-278a-50377d4c6454@xxxxxxx/

>
>> This is smaller than IO_TLB_MIN_SLABS.
>
> IO_TLB_MIN_SLABS is an arbitrary number, and it's inherited from IA64.
> So does the comment below. What exactly is the rationale behind it?

To be honest I do not have an answer as different people may have different
preferences. Some people may prefer to increase IO_TLB_MIN_SLABS, while some may
prefer to decrease.

That value of IO_TLB_MIN_SLABS was there since I started working on swiotlb.

>
>> The below comments mentioned that IO_TLB_MIN_SLABS is the "Minimum IO TLB size
>> to bother booting with".
>>
>> 56 /*
>> 57 * Minimum IO TLB size to bother booting with. Systems with mainly
>> 58 * 64bit capable cards will only lightly use the swiotlb. If we can't
>> 59 * allocate a contiguous 1MB, we're probably in trouble anyway.
>> 60 */
>> 61 #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
>>
>>
>> The arm may create swiotlb conditionally. That is, the swiotlb is not
>> initialized if (1) CONFIG_ARM_LPAE is not set (line 273), or (2) max_pfn <=
>> arm_dma_pfn_limit (line 274).
>>
>> arch/arm/mm/init.c
>>
>> 271 void __init mem_init(void)
>> 272 {
>> 273 #ifdef CONFIG_ARM_LPAE
>> 274 swiotlb_init(max_pfn > arm_dma_pfn_limit, SWIOTLB_VERBOSE);
>> 275 #endif
>> 276
>> 277 set_max_mapnr(pfn_to_page(max_pfn) - mem_map);
>>
>>
>> On x86, the swiotlb is not initialized if the memory is small (> MAX_DMA32_PFN,
>> at line 47), or the secure memory is not required.
>>
>> 44 static void __init pci_swiotlb_detect(void)
>> 45 {
>> 46 /* don't initialize swiotlb if iommu=off (no_iommu=1) */
>> 47 if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
>> 48 x86_swiotlb_enable = true;
>> 49
>> 50 /*
>> 51 * Set swiotlb to 1 so that bounce buffers are allocated and used for
>> 52 * devices that can't support DMA to encrypted memory.
>> 53 */
>> 54 if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
>> 55 x86_swiotlb_enable = true;
>> 56
>> 57 /*
>> 58 * Guest with guest memory encryption currently perform all DMA through
>> 59 * bounce buffers as the hypervisor can't access arbitrary VM memory
>> 60 * that is not explicitly shared with it.
>> 61 */
>> 62 if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
>> 63 x86_swiotlb_enable = true;
>> 64 x86_swiotlb_flags |= SWIOTLB_FORCE;
>> 65 }
>> 66 }
>
> Thanks for the code snippets. But I failed to see a point.
>
>> Regardless whether the current patch will be reverted, unless there is specific
>> reason (e.g., those PAGE_SIZE will be used), I do not think it is a good idea to
>> allocate <IO_TLB_MIN_SLABS swiotlb buffer.
>
> For what specific reason? My current understanding is that you want to
> be future/fool-proof. If so, then you got the priority wrong. We need
> to make sure currently working systems continue to work first, then
> future/fool-proof.

My objective was not to prove which option was correct/incorrect. I was just
going to share my understanding of the situation to people on this list, so that
a discussion can continue.

But I do agree with your point on 'priority'.

>
>> I will wait for the suggestion from
>> the swiotlb maintainer.
>
> Chris is on vacation. I sure can wait.
>
> But it sounds like you are unsure about what to do. If so, it's not
> what you claimed "we have already understood everything related to
> swiotlb" previously.

The swiotlb is a component used by many arch (e.g., x86, arm/arm64, mips and
xen) ... while I believe I understand the swiotlb well, I do not understand what
other components are expecting from the swiotlb.

That's why my objective was to trigger the discussion but not to prove my
statement was correct.

The swiotlb is still evolving due to it is more widely used recently (e.g., AMD
SEV). The discussion helps people avoid introducing more issue because there are
many components that may use swiotlb now or in the future.

About what to do, I think it is to either (1) revert the patch or (2) avoid
initializing swiotlb if it is not going to be used. However, I think my thoughts
do not count that much, and that's why I would wait for the suggestion from more
experienced maintainers.

Thank you very much!

Dongli Zhang

>
>> Since I do not have a mips environment, I am not able to test if the below makes
>> any trouble in your situation at arch/mips/cavium-octeon/dma-octeon.c.
>>
>> @@ -234,6 +234,8 @@ void __init plat_swiotlb_setup(void)
>> swiotlbsize = 64 * (1<<20);
>> #endif
>>
>> - swiotlb_adjust_size(swiotlbsize);
>> - swiotlb_init(true, SWIOTLB_VERBOSE);
>> + if (swiotlbsize != PAGE_SIZE) {
>> + swiotlb_adjust_size(swiotlbsize);
>> + swiotlb_init(true, SWIOTLB_VERBOSE);
>> + }
>> }
>
> Please ask the MIPS/Octeon maintainers to check this first.
>