Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

From: Alexey Kardashevskiy
Date: Mon Dec 02 2019 - 21:15:15 EST




On 03/12/2019 13:08, Ram Pai wrote:
> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 02/12/2019 17:45, Ram Pai wrote:
>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>> its parameters. One page is dedicated per cpu, for the lifetime of the
>>> kernel for this purpose. On secure VMs, contents of this page, when
>>> accessed by the hypervisor, retrieves encrypted TCE entries. Hypervisor
>>> needs to know the unencrypted entries, to update the TCE table
>>> accordingly. There is nothing secret or sensitive about these entries.
>>> Hence share the page with the hypervisor.
>>
>> This unsecures a page in the guest in a random place which creates an
>> additional attack surface which is hard to exploit indeed but
>> nevertheless it is there.
>> A safer option would be not to use the
>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
>> in the guest).
>
>
> Hmm... How do we not use it? AFAICT hcall-multi-tce option gets invoked
> automatically when IOMMU option is enabled.

It is advertised by QEMU but the guest does not have to use it.

> This happens even
> on a normal VM when IOMMU is enabled.
>
>
>>
>> Also what is this for anyway?
>
> This is for sending indirect-TCE entries to the hypervisor.
> The hypervisor must be able to read those TCE entries, so that it can
> use those entires to populate the TCE table with the correct mappings.
>
>> if I understand things right, you cannot
>> map any random guest memory, you should only be mapping that 64MB-ish
>> bounce buffer array but 1) I do not see that happening (I may have
>> missed it) 2) it should be done once and it takes a little time for
>> whatever memory size we allow for bounce buffers anyway. Thanks,
>
> Any random guest memory can be shared by the guest.

Yes but we do not want this to be this random. I thought the whole idea
of swiotlb was to restrict the amount of shared memory to bare minimum,
what do I miss?

> Maybe you are confusing this with the SWIOTLB bounce buffers used by PCI
> devices, to transfer data to the hypervisor?

Is not this for pci+swiotlb? The cover letter suggests it is for
virtio-scsi-_pci_ with iommu_platform=on which makes it a normal pci
device just like emulated XHCI. Thanks,




>
>>
>>
>>>
>>> Signed-off-by: Ram Pai <linuxram@xxxxxxxxxx>
>>> ---
>>> arch/powerpc/platforms/pseries/iommu.c | 23 ++++++++++++++++++++---
>>> 1 file changed, 20 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 6ba081d..0720831 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -37,6 +37,7 @@
>>> #include <asm/mmzone.h>
>>> #include <asm/plpar_wrappers.h>
>>> #include <asm/svm.h>
>>> +#include <asm/ultravisor.h>
>>>
>>> #include "pseries.h"
>>>
>>> @@ -179,6 +180,23 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>
>>> static DEFINE_PER_CPU(__be64 *, tce_page);
>>>
>>> +/*
>>> + * Allocate a tce page. If secure VM, share the page with the hypervisor.
>>> + *
>>> + * NOTE: the TCE page is shared with the hypervisor explicitly and remains
>>> + * shared for the lifetime of the kernel. It is implicitly unshared at kernel
>>> + * shutdown through a UV_UNSHARE_ALL_PAGES ucall.
>>> + */
>>> +static __be64 *alloc_tce_page(void)
>>> +{
>>> + __be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> +
>>> + if (tcep && is_secure_guest())
>>> + uv_share_page(PHYS_PFN(__pa(tcep)), 1);
>>> +
>>> + return tcep;
>>> +}
>>> +
>>> static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>> long npages, unsigned long uaddr,
>>> enum dma_data_direction direction,
>>> @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>> * from iommu_alloc{,_sg}()
>>> */
>>> if (!tcep) {
>>> - tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> - /* If allocation fails, fall back to the loop implementation */
>>> + tcep = alloc_tce_page();
>>> if (!tcep) {
>>> local_irq_restore(flags);
>>> return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
>>> @@ -405,7 +422,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>>> tcep = __this_cpu_read(tce_page);
>>>
>>> if (!tcep) {
>>> - tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> + tcep = alloc_tce_page();
>>> if (!tcep) {
>>> local_irq_enable();
>>> return -ENOMEM;
>>>
>>
>> --
>> Alexey
>

--
Alexey