I try to solve this problem by creating a new kernel thread, "kccd", to populate the TLB buffer in the backgroud.
Specifically,
1. A new kernel thread is created with the help of "arch_initcall", and this kthread is responsible for memory allocation and setting memory attributes (private or shared);
2. The "swiotlb_tbl_map_single" routine only use the spin_lock protected TLB buffers pre-allocated by the kthread;
a) which actually includes ONE memory allocation brought by xarray insertion "__xa_insert__".
That already seems dangerous with all the usual problems of memory allocations in IO paths. Normally code at least uses a mempool to avoid the worst dead lock potential.
No, this cannot guarantee we always have sufficient TLB caches, so we can also have a "No memory for cc-swiotlb buffer" warning.3. After each allocation, the water level of TLB resources will be checked. If the current TLB resources are found to be lower than the preset value (half of the watermark), the kthread will be awakened to fill them.
4. The TLB buffer allocation in the kthread is batched to "(MAX_ORDER_NR_PAGES << PAGE_SHIFT)" to reduce the holding time of spin_lock and number of calls to set_memory_decrypted().
Okay, but does this guarantee that it will never run out of memory?
It seems difficult to make such guarantees. What happens for example if the background thread gets starved by something higher priority?
Or if the allocators have such high bandwidth that they can overwhelm any reasonable background thread.
-Andi