Re: [PATCH v4 -mm] rework mem_cgroup iterator

From: Michal Hocko
Date: Wed Mar 13 2013 - 11:08:49 EST


Hi Andrew,
it seems that there were no other objections[1] to this version of the
patchset. Could you take it into your tree and target it for 3.10?

---
[1] The only one from Johannes (https://lkml.org/lkml/2013/2/8/379) has
been addressed

On Thu 14-02-13 14:26:30, Michal Hocko wrote:
> Hi,
> this is a fourth iteration of the patch series previously posted here:
> http://lkml.org/lkml/2013/1/3/293. I am adding Andrew to the CC as I
> feel this is getting to the acceptable state finally. It still need
> review as some things changed a lot since the last version but we are
> getting there...
>
> The patch set tries to make mem_cgroup_iter saner in the way how it
> walks hierarchies. css->id based traversal is far from being ideal as
> it is not deterministic because it depends on the creation ordering.
> Additional to that css_id is considered a burden for cgroup maintainers
> because it is quite some code and memcg is the last user of it. After
> this series only the swap accounting uses css_id but that one will
> follow up later.
>
> Diffstat (if we exclude removed/added comments) looks quite
> promissing. We got rid of some code:
> $ git diff mmotm... | grep -v "^[+-][[:space:]]*[/ ]\*" | diffstat
> b/include/linux/cgroup.h | 3 ---
> kernel/cgroup.c | 33 ---------------------------------
> mm/memcontrol.c | 4 +++-
> 3 files changed, 3 insertions(+), 37 deletions(-)
>
> The first patch is just preparatory and it changes when we release css
> of the previously returned memcg. Nothing controlversial.
>
> The second patch is the core of the patchset and it replaces css_get_next
> based on css_id by the generic cgroup pre-order. This brings some
> chalanges for the last visited group caching during the reclaim
> (mem_cgroup_per_zone::reclaim_iter). We have to use memcg pointers
> directly now which means that we have to keep a reference to those
> groups' css to keep them alive.
> I also folded iter_lock introduced by https://lkml.org/lkml/2013/1/3/295
> in the previous version into this patch. Johannes felt the race I
> was describing should be mostly harmless and I haven't been able to
> trigger it so the lock doesn't deserve its own patch. It is still needed
> temporarily, though, because the reference counting on iter->last_visited
> depends on it. It will go away with the next patch.
>
> The next patch fixups an unbounded cgroup removal holdoff caused
> by the elevated css refcount. The issue has been observed by
> Ying Han. Johannes wasn't impressed by the previous version of the fix
> (https://lkml.org/lkml/2013/2/8/379) which cleaned up pending references
> during mem_cgroup_css_offline when a group is removed. He has suggested
> a different way when the iterator checks whether a cached memcg is
> still valid or no. More on that in the patch but the basic idea is that
> every memcg tracks the number removed subgroups and iterator records
> this number when a group is cached. These numbers are checked before
> iter->last_visited is about to be used and the iteration is restarted if
> it is invalid.
>
> The fourth and fifth patches are an attempt for simplification of the
> mem_cgroup_iter. css juggling is removed and the iteration logic is
> moved to a helper so that the reference counting and iteration are
> separated.
>
> The last patch just removes css_get_next as there is no user for it any
> longer.
>
> I have dropped Acks from patches that needed a rework since the last
> time so please have a look at them again (especially 1-4). I hope I
> haven't screwed anything during the rebase.
>
> My testing looked as follows:
> A (use_hierarchy=1, limit_in_bytes=150M)
> /|\
> 1 2 3
>
> Children groups were created so that the number is never higher than
> 3 and their limits were random between 50-100M. Each group hosts a
> kernel build (starting with tar -xf so the tree is not shared and make
> -jNUM_CPUs/3) and terminated after random time - up to 5 minutes) and
> then it is removed.
> This should exercise both leaf and hierarchical reclaim as well as
> races with cgroup removals and debugging messages I added on top proved
> that. 100 groups were created during the test.
>
> Shortlog says:
> Michal Hocko (6):
> memcg: keep prev's css alive for the whole mem_cgroup_iter
> memcg: rework mem_cgroup_iter to use cgroup iterators
> memcg: relax memcg iter caching
> memcg: simplify mem_cgroup_iter
> memcg: further simplify mem_cgroup_iter
> cgroup: remove css_get_next
>
> Full diffstat says:
> include/linux/cgroup.h | 7 ---
> kernel/cgroup.c | 49 ----------------
> mm/memcontrol.c | 145 ++++++++++++++++++++++++++++++++++++++++--------
> 3 files changed, 121 insertions(+), 80 deletions(-)
>
> Any suggestions are welcome of course.
>
> Thanks!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/