[ 4.319253] iommu: Adding device 0000:06:00.2 to group 5
[ 4.325869] iommu: Adding device 0000:20:01.0 to group 15
[ 4.332648] iommu: Adding device 0000:20:02.0 to group 16
[ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
[ 4.350251] swapper/0 cpuset=/ mems_allowed=0
[ 4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.57.mx64.282 #1
[ 4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.9.3 06/25/2019
[ 4.355612] Call Trace:
[ 4.355612] dump_stack+0x46/0x5b
[ 4.355612] dump_header+0x6b/0x289
[ 4.355612] out_of_memory+0x470/0x4c0
[ 4.355612] __alloc_pages_nodemask+0x970/0x1030
[ 4.355612] cache_grow_begin+0x7d/0x520
[ 4.355612] fallback_alloc+0x148/0x200
[ 4.355612] kmem_cache_alloc_trace+0xac/0x1f0
[ 4.355612] init_iova_domain+0x112/0x170
[ 4.355612] amd_iommu_domain_alloc+0x138/0x1a0
[ 4.355612] iommu_group_get_for_dev+0xc4/0x1a0
[ 4.355612] amd_iommu_add_device+0x13a/0x610
[ 4.355612] add_iommu_group+0x20/0x30
[ 4.355612] bus_for_each_dev+0x76/0xc0
[ 4.355612] bus_set_iommu+0xb6/0xf0
[ 4.355612] amd_iommu_init_api+0x112/0x132
[ 4.355612] state_next+0xfb1/0x1165
[ 4.355612] amd_iommu_init+0x1f/0x67
[ 4.355612] pci_iommu_init+0x16/0x3f
...
[ 4.670295] Unreclaimable slab info:
...
[ 4.857565] kmalloc-2048 59164KB 59164KB
Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
1024 bytes so that no memory will be wasted.
[1]. https://lkml.org/lkml/2019/8/12/266
Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
---
drivers/iommu/iova.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145b..27634ddd9b904 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
* dynamic size tuning described in the paper.
*/
-#define IOVA_MAG_SIZE 128
+/*
+ * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
+ * assure size of 'iova_magzine' to be 1024 bytes, so that no memory
Typo: iova_magazine
+ * will be wasted.
+ */
+#define IOVA_MAG_SIZE 127
I do wonder if we will see some strange new behaviour since IOVA_FQ_SIZE % IOVA_MAG_SIZE != 0 now...
I doubt it - even if a flush queue does happen to be entirely full of equal-sized IOVAs, a CPU's loaded magazines also both being perfectly empty when it comes to dump a full fq seem further unlikely, so in practice I don't see this making any appreciable change to the likelihood of spilling back to the depot or not. In fact the smaller the magazines get, the less time would be spent flushing the depot back to the rbtree, where your interesting workload falls off the cliff and never catches back up with the fq timer, so at some point it might even improve (unless it's also already close to the point where smaller caches would bottleneck allocation)... might be interesting to experiment with a wider range of magazine sizes if you had the time and inclination.