Re: memcg writeout throttling, was: [patch 4/4] memcg: use nativeword page statistics counters

From: Greg Thelen
Date: Mon Nov 08 2010 - 14:01:27 EST


On Mon, Nov 8, 2010 at 7:45 AM, Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> On Mon, Nov 08, 2010 at 05:37:16PM +0800, Johannes Weiner wrote:
>> On Mon, Nov 08, 2010 at 09:07:35AM +0900, Minchan Kim wrote:
>> > BTW, let me ask a question.
>> > dirty_writeback_pages seems to be depends on mem_cgroup_page_stat's
>> > result(ie, negative) for separate global and memcg.
>> > But mem_cgroup_page_stat could return negative value by per-cpu as
>> > well as root cgroup.
>> > If I understand right, Isn't it a problem?
>>
>> Yes, the numbers are not reliable and may be off by some.  It appears
>> to me that the only sensible interpretation of a negative sum is to
>> assume zero, though.  So to be honest, I don't understand the fallback
>> to global state when the local state fluctuates around low values.
>
> Agreed. It does not make sense to compare values from different domains.
>
> The bdi stats use percpu_counter_sum_positive() which never return
> negative values. It may be suitable for memcg page counts, too.
>
>> This function is also only used in throttle_vm_writeout(), where the
>> outcome is compared to the global dirty threshold.  So using the
>> number of writeback pages _from the current cgroup_ and falling back
>> to global writeback pages when this number is low makes no sense to me
>> at all.
>>
>> I looks like it should rather compare the cgroup state with the cgroup
>> limit, and the global state with the global limit.
>
> Right.
>
>> Can somebody explain the reasoning behind this?  And in case it makes
>> sense after all, put a comment into this function?
>
> It seems a better match to test sc->mem_cgroup rather than
> mem_cgroup_from_task(current). The latter could make mismatches. When
> someone is changing the memcg limits and hence triggers memcg
> reclaims, the current task is actually the (unrelated) shell. It's
> also possible for the memcg task to trigger _global_ direct reclaim.

Good point. I am writing a patch that will pass mem_cgroup from
sc->mem_cgroup into mem_cgroup_page_stat() rather than using
mem_cgroup_from_task(current). I will post this patch in a few hours.

I will also fix the negative value issue in mem_cgroup_page_stat().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/