Re: [PATCH -v2 4/6] memcg: make sure that memcg is not offline whencharging

From: Tejun Heo
Date: Wed Feb 05 2014 - 10:42:16 EST


Hello, guys.

On Wed, Feb 05, 2014 at 10:28:21AM -0500, Johannes Weiner wrote:
> I thought more about this and talked to Tejun as well. He told me
> that the rcu grace period between disabling tryget and calling
> css_offline() is currently an implementation detail of the refcounter
> that css uses, but it's not a guarantee. So my initial idea of

Yeah, that's an implementation detail coming from how percpu_ref is
implemented at the moment. Also, it's a sched_rcu grace period, not a
normal one. The only RCU-related guarnatee that cgroup core gives is
that there will be a full RCU grace period between css's ref reaching
zero and invocation of ->css_free() so that it's safe to do
css_tryget() inside RCU critical sections.

In short, offlining is *not* protected by RCU. Freeing is.

> Well, css_free() is the callback invoked when the ref counter hits 0,
> and that is a guarantee. From a memcg perspective, it's the right
> place to do reparenting, not css_offline().

So, css_offline() is cgroup telling controllers two things.

* The destruction of the css, which will commence when css ref reaches
zero, has initiated. If you're holding any long term css refs for
caching and stuff, put them so that destruction can actually happen.

* Any css_tryget() attempts which haven't finished yet are guaranteed
to fail. (there's no implied RCU protection here)

Maybe offline is a bit of misnomer. It's really just telling the
controllers to get prepared to be destroyed.

> Here is the only exception to the above: swapout records maintain
> permanent css references, so they prevent css_free() from running.
> For that reason alone we should run one optimistic reparenting in
> css_offline() to make sure one swap record does not pin gigabytes of
> pages in an offlined cgroup, which is unreachable for reclaim. But
> the reparenting for *correctness* is in css_free(), not css_offline().

A more canonical use case can be found in blkcg. blkcg holds "cache"
css refs for optimization in the indexing data structure. On offline,
blkcg purges those refs so that those stale cache refs don't put off
actual destruction for too long. But yeah the above sounds like a
plausible use case too.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/