On Thu, May 12, 2016 at 09:11:33AM +0800, Miao Xie wrote:
My box has 48 cores and 188GB memory, but I set
vm.dirty_background_bytes = 268435456
vm.dirty_bytes = 536870912
if I set vm.dirty_background_bytes and vm.dirty_bytes to be a large number(vm.dirty_background_bytes = 3GB,
vm.dirty_bytes = 4GB), then fio thoughput would be more than 1500MB/s. and then if I reset them to the original
value(the above ones), the thoughout would be down to 500MB/s.
And according my debug, I found fio sleeped for 1ms every time we dirty a page(balance dirty pages) when
the thoughput was down to 4MB/s, it might be a bug of dirty throttle when we open write back cgroup, I think.
Heh, so, for cgroups, the absolute byte limits can't applied directly
and converted to percentage value before being applied. You're
specifying 0.27% for threshold. Unfortunately, the ratio is
translated into a percentage number and 0.27% becomes 0, so your
cgroups are always over limit and being throttled.
Can you please see whether the following patch fixes the issue?
Thanks.
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 999792d..a455a21 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -369,8 +369,9 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
struct dirty_throttle_control *gdtc = mdtc_gdtc(dtc);
unsigned long bytes = vm_dirty_bytes;
unsigned long bg_bytes = dirty_background_bytes;
- unsigned long ratio = vm_dirty_ratio;
- unsigned long bg_ratio = dirty_background_ratio;
+ /* convert ratios to per-PAGE_SIZE for higher precision */
+ unsigned long ratio = (vm_dirty_ratio * PAGE_SIZE) / 100;
+ unsigned long bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100;
unsigned long thresh;
unsigned long bg_thresh;
struct task_struct *tsk;
@@ -382,26 +383,28 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
/*
* The byte settings can't be applied directly to memcg
* domains. Convert them to ratios by scaling against
- * globally available memory.
+ * globally available memory. As the ratios are in
+ * per-PAGE_SIZE, they can be obtained by dividing bytes by
+ * pages.
*/
if (bytes)
- ratio = min(DIV_ROUND_UP(bytes, PAGE_SIZE) * 100 /
- global_avail, 100UL);
+ ratio = min(DIV_ROUND_UP(bytes, global_avail),
+ PAGE_SIZE);
if (bg_bytes)
- bg_ratio = min(DIV_ROUND_UP(bg_bytes, PAGE_SIZE) * 100 /
- global_avail, 100UL);
+ bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
+ PAGE_SIZE);
bytes = bg_bytes = 0;
}
if (bytes)
thresh = DIV_ROUND_UP(bytes, PAGE_SIZE);
else
- thresh = (ratio * available_memory) / 100;
+ thresh = (ratio * available_memory) / PAGE_SIZE;
if (bg_bytes)
bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
else
- bg_thresh = (bg_ratio * available_memory) / 100;
+ bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE;
if (bg_thresh >= thresh)
bg_thresh = thresh / 2;
.