Re: Write throughput impaired by touching dirty_ratio

From: Vlastimil Babka
Date: Thu Jun 25 2015 - 05:30:40 EST


On 06/25/2015 12:26 AM, Mark Hills wrote:
On Wed, 24 Jun 2015, Vlastimil Babka wrote:

[add some CC's]

On 06/19/2015 05:16 PM, Mark Hills wrote:

Hmm, so the only thing that dirty_ratio_handler() changes except the
vm_dirty_ratio itself, is ratelimit_pages through writeback_set_ratelimit(). So
I assume the problem is with ratelimit_pages. There's num_online_cpus() used in
the calculation, which I think would differ between the initial system state
(where we are called by page_writeback_init()) and later when all CPU's are
onlined. But I don't see CPU onlining code updating the limit (unlike memory
hotplug which does that), so that's suspicious.

Another suspicious thing is that global_dirty_limits() looks at current
process's flag. It seems odd to me that the process calling the sysctl would
determine a value global to the system.

Yes, I also spotted this. The fragment of code is:

tsk = current;
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
background += background / 4;
dirty += dirty / 4;
}

It seems to imply the code was not always used from the /proc interface.
It's relevant in a moment...

If you are brave enough (and have kernel configured properly and with
debuginfo),

I'm brave... :) I hadn't seen this tool before, thanks for introducing me
to it, I will use it more now, I'm sure.

Ok I admit I didn't expect so much outcome from my suggestion. Good job :)

you can verify how value of ratelimit_pages variable changes on the live
system, using the crash tool. Just start it, and if everything works,
you can inspect the live system. It's a bit complicated since there are
two static variables called "ratelimit_pages" in the kernel so we can't
print them easily (or I don't know how). First we have to get the
variable address:

crash> sym ratelimit_pages
ffffffff81e67200 (d) ratelimit_pages
ffffffff81ef4638 (d) ratelimit_pages

One will be absurdly high (probably less on your 32bit) so it's not the one we want:

crash> rd -d ffffffff81ef4638 1
ffffffff81ef4638: 4294967328768

The second will have a smaller value:
(my system after boot with dirty ratio = 20)
crash> rd -d ffffffff81e67200 1
ffffffff81e67200: 1577

(after changing to 21)
crash> rd -d ffffffff81e67200 1
ffffffff81e67200: 1570

(after changing back to 20)
crash> rd -d ffffffff81e67200 1
ffffffff81e67200: 1496

In my case there's only one such symbol (perhaps because this kernel
config is quite slimmed down?)

crash> sym ratelimit_pages
c148b618 (d) ratelimit_pages

(bootup with dirty_ratio 20)
crash> rd -d ratelimit_pages
c148b618: 78

With just one symbol you can use
crash> p ratelimit_pages

This will take the type properly into account, while rd will print full 32bit/64bit depending on your kernel, which might be larger than the actual variable. But if there are more symbols of same name, "p" will somehow randomly pick one of them and don't even warn about it.

[snip]




Thanks, I hope you find this useful.

Yes, thanks, nice analysis. Since Michal already replied and has more experience with the reclaim code and dirty throttling, I won't try adding more.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/