Re: [PATCH v5 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

From: Alexey Kardashevskiy
Date: Wed Dec 11 2019 - 03:16:02 EST




On 11/12/2019 02:35, Ram Pai wrote:
> On Tue, Dec 10, 2019 at 04:32:10PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 10/12/2019 16:12, Ram Pai wrote:
>>> On Tue, Dec 10, 2019 at 02:07:36PM +1100, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 07/12/2019 12:12, Ram Pai wrote:
>>>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>>>> its parameters. On secure VMs, hypervisor cannot access the contents of
>>>>> this page since it gets encrypted. Hence share the page with the
>>>>> hypervisor, and unshare when done.
>>>>
>>>>
>>>> I thought the idea was to use H_PUT_TCE and avoid sharing any extra
>>>> pages. There is small problem that when DDW is enabled,
>>>> FW_FEATURE_MULTITCE is ignored (easy to fix); I also noticed complains
>>>> about the performance on slack but this is caused by initial cleanup of
>>>> the default TCE window (which we do not use anyway) and to battle this
>>>> we can simply reduce its size by adding
>>>
>>> something that takes hardly any time with H_PUT_TCE_INDIRECT, takes
>>> 13secs per device for H_PUT_TCE approach, during boot. This is with a
>>> 30GB guest. With larger guest, the time will further detoriate.
>>
>>
>> No it will not, I checked. The time is the same for 2GB and 32GB guests-
>> the delay is caused by clearing the small DMA window which is small by
>> the space mapped (1GB) but quite huge in TCEs as it uses 4K pages; and
>> for DDW window + emulated devices the IOMMU page size will be 2M/16M/1G
>> (depends on the system) so the number of TCEs is much smaller.
>
> I cant get your results. What changes did you make to get it?


Get what? I passed "-m 2G" and "-m 32G", got the same time - 13s spent
in clearing the default window and the huge window took a fraction of a
second to create and map.


>>>>
>>>> -global
>>>> spapr-pci-host-bridge.dma_win_size=0x4000000
>>>
>>> This option, speeds it up tremendously. But than should this option be
>>> enabled in qemu by default? only for secure VMs? for both VMs?
>>
>>
>> As discussed in slack, by default we do not need to clear the entire TCE
>> table and we only have to map swiotlb buffer using the small window. It
>> is a guest kernel change only. Thanks,
>
> Can you tell me what code you are talking about here. Where is the TCE
> table getting cleared? What code needs to be changed to not clear it?


pci_dma_bus_setup_pSeriesLP()
iommu_init_table()
iommu_table_clear()
for () tbl->it_ops->get()

We do not really need to clear it there, we only need it for VFIO with
IOMMU SPAPR TCE v1 which reuses these tables but there are
iommu_take_ownership/iommu_release_ownership to clear these tables. I'll
send a patch for this.


> Is the code in tce_buildmulti_pSeriesLP(), the one that does the clear
> aswell?


This one does not need to clear TCEs as this creates a window of known
size and maps it all.

Well, actually, it only maps actual guest RAM, if there are gaps in RAM,
then TCEs for the gaps will have what hypervisor had there (which is
zeroes, qemu/kvm clears it anyway).


> But before I close, you have not told me clearly, what is the problem
> with; 'share the page, make the H_PUT_INDIRECT_TCE hcall, unshare the page'.

Between share and unshare you have a (tiny) window of opportunity to
attack the guest. No, I do not know how exactly.

For example, the hypervisor does a lot of PHB+PCI hotplug-unplug with
64bit devices - each time this will create a huge window which will
share/unshare the same page. No, I do not know how exactly how this can
be exploited either, we cannot rely of what you or myself know today. My
point is that we should not be sharing pages at all unless we really
really have to, and this does not seem to be the case.

But since this seems to an acceptable compromise anyway,

Reviewed-by: Alexey Kardashevskiy <aik@xxxxxxxxx>





> Remember this is the same page that is earmarked for doing
> H_PUT_INDIRECT_TCE, not by my patch, but its already earmarked by the
> existing code. So it not some random buffer that is picked. Second
> this page is temporarily shared and unshared, it does not stay shared
> for life. It does not slow the boot. it does not need any
> special command line options on the qemu.
>> Shared pages technology was put in place, exactly for the purpose of
> sharing data with the hypervisor. We are using this technology exactly
> for that purpose. And finally I agreed with your concern of having
> shared pages staying around. Hence i addressed that concern, by
> unsharing the page. At this point, I fail to understand your concern.




--
Alexey