Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

From: Ashish Kalra
Date: Wed Jun 24 2020 - 03:05:19 EST


On Wed, Jun 24, 2020 at 12:23:57AM +0000, Ashish Kalra wrote:
> Hello Konrad,
>
> On Tue, Jun 23, 2020 at 09:38:43AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Mon, Apr 27, 2020 at 06:53:18PM +0000, Ashish Kalra wrote:
> > > Hello Konrad,
> > >
> > > On Mon, Mar 30, 2020 at 10:25:51PM +0000, Ashish Kalra wrote:
> > > > Hello Konrad,
> > > >
> > > > On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> > > > > > Hello Konrad,
> > > > > >
> > > > > > Looking fwd. to your feedback regarding support of other memory
> > > > > > encryption architectures such as Power, S390, etc.
> > > > > >
> > > > > > Thanks,
> > > > > > Ashish
> > > > > >
> > > > > > On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > > > > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > >
> > > > > > > > > Additional memory calculations based on # of PCI devices and
> > > > > > > > > their memory ranges will make it more complicated with so
> > > > > > > > > many other permutations and combinations to explore, it is
> > > > > > > > > essential to keep this patch as simple as possible by
> > > > > > > > > adjusting the bounce buffer size simply by determining it
> > > > > > > > > from the amount of provisioned guest memory.
> > > > > > > >>
> > > > > > > >> Please rework the patch to:
> > > > > > > >>
> > > > > > > >> - Use a log solution instead of the multiplication.
> > > > > > > >> Feel free to cap it at a sensible value.
> > > > > > >
> > > > > > > Ok.
> > > > > > >
> > > > > > > >>
> > > > > > > >> - Also the code depends on SWIOTLB calling in to the
> > > > > > > >> adjust_swiotlb_default_size which looks wrong.
> > > > > > > >>
> > > > > > > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > > > > > >
> > > > > > > >> That function's purpose is to report a value.
> > > > > > > >>
> > > > > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > > > > > >>
> > > > > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > > > > > > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> > > > > > >
> > > > > > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > > > > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > > > > > for other memory encryption architectures like Power, S390, etc.
> > > > >
> > > > > Oh dear. That I hadn't considered.
> > > > > > >
> > > > > > > >>
> > > > > > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > > > > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > > > > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > > > > > >
> > > > > > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > > > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > > > > > forces swiotlb on, but again this is all x86 architecture specific.
> > > > >
> > > > > Then it looks like the best bet is to do it from within swiotlb_init?
> > > > > We really can't do it from swiotlb_size_or_default - that function
> > > > > should just return a value and nothing else.
> > > > >
> > > >
> > > > Actually, we need to do it in swiotlb_size_or_default() as this gets called by
> > > > reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
> > > > reserve low crashkernel memory. If we adjust swiotlb size later in
> > > > swiotlb_init() which gets called later than reserve_crashkernel_low(),
> > > > then any swiotlb size changes/expansion will conflict/overlap with the
> > > > low memory reserved for crashkernel.
> > > >
> > > and will also potentially cause SWIOTLB buffer allocation failures.
> > >
> > > Do you have any feedback, comments on the above ?
> >
> >
> > The init boot chain looks like this:
> >
> > initmem_init
> > pci_iommu_alloc
> > -> pci_swiotlb_detect_4gb
> > -> swiotlb_init
> >
> > reserve_crashkernel
> > reserve_crashkernel_low
> > -> swiotlb_size_or_default
> > ..
> >
> >
> > (rootfs code):
> > pci_iommu_init
> > -> a bunch of the other IOMMU late_init code gets called..
> > -> pci_swiotlb_late_init
> >
> > I have to say I am lost to how your patch fixes "If we adjust swiolb
> > size later .. then any swiotlb size .. will overlap with the low memory
> > reserved for crashkernel"?
> >
>
> Actually as per the boot flow :
>
> setup_arch() calls reserve_crashkernel() and pci_iommu_alloc() is
> invoked through mm_init()/mem_init() and not via initmem_init().
>
> start_kernel:
> ...
> setup_arch()
> reserve_crashkernel
> reserve_crashkernel_low
> -> swiotlb_size_or_default
>
> ...
> ...
> mm_init()
> mem_init()
> pci_iommu_alloc
> -> pci_swiotlb_detect_4gb
> -> swiotlb_init
>
> So as per the above boot flow, reserve_crashkernel() can get called
> before swiotlb_detect/init, and hence, if we don't fixup or adjust
> the SWIOTLB buffer size in swiotlb_size_or_default() then crash kernel
> will reserve memory which will conflict/overlap with any SWIOTLB bounce
> buffer allocated memory (adjusted or fixed up later).
>
> Therefore, we need to adjust/fixup SWIOTLB bounce buffer memory in
> swiotlb_size_or_default() function itself, before swiotlb detect/init
> funtions get invoked.
>

Also to add here, it looks like swiotlb_size_or_default() is an
interface function to get the SWIOTLB bounce buffer size for components
which are initialized before swiotlb_detect/init, so that these
components can reserve or allocate their memory requirements with the
knowledge of how much SWIOTLB bounce buffers are going to use, so
therefore, any fixups or adjustments to SWIOTLB buffer size will need
to be made as part of swiotlb_size_or_default().

Thanks,
Ashish

> > Or are you saying that 'reserve_crashkernel_low' is the _culprit_ and it
> > is the one changing the size? And hence it modifying the swiotlb size
> > will fix this problem? Aka _before_ all the other IOMMU get their hand
> > on it?
> >
> > If so why not create an
> > IOMMU_INIT(crashkernel_adjust_swiotlb,pci_swiotlb_detect_override,
> > NULL, NULL);
> >
> > And crashkernel_adjust_swiotlb would change the size of swiotlb buffer
> > if conditions are found to require it.
> >
> > You also may want to put a #define DEBUG in arch/x86/kernel/pci-iommu_table.c
> > to check out whether the tree structure of IOMMU entries is correct.
> >
> >
> >
> > But still I am lost - if say the AMD one does decide for unknown reason
> > to expand the SWIOTLB you are still stuck with the 'overlap with
> > the low memory reserved' or so.
> >
> > Perhaps add a late_init that gets called as the last one to validate
> > this ? And maybe if the swiotlb gets turned off you also take proper
> > steps?
> >
> > > As such i feel, this patch is complete otherwise and can be included as
> > > it is.
> > >
> > > Thanks,
> > > Ashish