Re: [PATCH v3 28/30] mm: memcontrol: prepare for reparenting state_local

From: Harry Yoo

Date: Sun Feb 01 2026 - 22:18:57 EST

On Fri, Jan 30, 2026 at 03:22:20PM +0800, Qi Zheng wrote:
> On 1/29/26 8:23 PM, Harry Yoo wrote:
> > On Thu, Jan 29, 2026 at 04:50:39PM +0800, Qi Zheng wrote:
> > > On 1/29/26 10:10 AM, Harry Yoo wrote:
> > > > On Mon, Jan 19, 2026 at 11:34:53AM +0800, Qi Zheng wrote:
> > > > > On 1/18/26 11:20 AM, Shakeel Butt wrote:
> > > > > > On Wed, Jan 14, 2026 at 07:32:55PM +0800, Qi Zheng wrote:
> > > > > Since these two functions require memcg or lruvec, they are already
> > > > > within the critical section of the RCU lock.
> > > >
> > > > What happens if someone grabbed a refcount and then release the rcu read
> > > > lock before percpu refkill and then call mod_memcg[_lruvec]_state()?
> > > >
> > > > In this case, can we end up reparenting in the middle of non-hierarchical
> > > > stat update because they don't have RCU grace period?
> > > >
> > > > Something like
> > > >
> > > > T1 T2
> > > >
> > > > - rcu_read_lock()
> > > > - get memcg refcnt
> > > > - rcu_read_unlock()
> > > >
> > > > - call mod_memcg_state()
> > > > - CSS_IS_DYING is not set
> > > > - Set CSS_IS_DYING
> > > > - Trigger percpu refkill
> > > >
> > > > - Trigger offline_css()
> > > > -> reparent non-hierarchical - update non-hierarchical stats
> > > > stats
> > > > - put memcg refcount
> > >
> > > Good catch, I think you are right.
> > >
> > > The rcu lock should be added to mod_memcg_state() and
> > > mod_memcg_lruvec_state().
> >
> > Thanks for confirming!
> >
> > Because it's quite confusing, let me ask few more questions...
> >
> > Q1. Yosry mentioned [1] [2] that stat updates should be done in the same
> > RCU section that is used to grab a refcount of the cgroup.
> >
> > But I don't think your work is relying on that. Instead, I guess, it's
> > relying on the CSS_DYING check from reader side to determine whether it
>
> Only relying the CSS_DYING check is insufficient. Otherwise, the
> following race may occur:
>
> T1 T2
>
> memcg_is_dying is false
> Set CSS_IS_DYING
> reparent non-hierarchical update non-hierarchical stats for child
>
> So IIUC we should add rcu lock, then:

Right, RCU here is used to enforce ordering between reparenting
non-hierarchical stats and stat updates for child.

> T1 T2
>
> rcu_read_lock
> memcg_is_dying is false
>
> Set CSS_IS_DYING
> update non-hierarchical stats for child
> rcu_read_unlock
>
> synchronize_rcu or rcu work
> --> reparent non-hierarchical
>
> > should update stats of the child or parent memcg, right?
> >
> > -> That being said, when rcu_read_lock() is called _after_ stats are
> > reparented, the reader must observe that the CSS_DYING flag is set.
> >
> > [1] https://lore.kernel.org/all/utl6esq7jz5e4f7kwgrpwdjc2rm3yi33ljb6xkm5nxzufa4o7s@hblq2piu3vnz
> > [2] https://lore.kernel.org/all/ebdhvcwygvnfejai5azhg3sjudsjorwmlcvmzadpkhexoeq3tb@5gj5y2exdhpn
> >
> > Q2. When a reader checks CSS_DYING flag, how is the flag change
> > guaranteed to be visible to the reader without any lock, memory barrier,
> > or atomic ops involved?
>
> The main situation requiring CSS_DYING check is as follow:
>
> T1 T2
>
> Set CSS_IS_DYING
>
> synchronize_rcu or rcu work
> --> reparent non-hierarchical
> rcu_read_lock()
> memcg_is_dying is true
> update non-hierarchical stats for parent
>
> Referring to the "Memory-Barrier Guarantees" section in [1],

Today I learned a little bit about RCU's requirements :)

> synchronize_rcu() can guarantee that T2 can see CSS_IS_DYING. Right?

Right. It is fair to assume that a read-side critical section either

1) ends before synchronize_rcu(), in this case, the reader might see
CSS_DYING flag or not, but it doesn't matter because this reader
ends before reparenting the stats.

2) starts after synchronize_rcu() starts, in this case, all stores
before the grace period starts must be propagated to all CPUs before
the critical section starts. Thus, such a reader is guaranteed
to see the CSS_DYING flag set.

Linux Kernel Memory Model [2] explicitly explains 1) and 2):
> RCU (Read-Copy-Update) is a powerful synchronization mechanism. It
> rests on two concepts: grace periods and read-side critical sections.
>
> A grace period is the span of time occupied by a call to
> synchronize_rcu(). A read-side critical section (or just critical
> section, for short) is a region of code delimited by rcu_read_lock()
> at the start and rcu_read_unlock() at the end. Critical sections can
> be nested, although we won't make use of this fact.
>
> As far as memory models are concerned, RCU's main feature is its
> Grace-Period Guarantee, which states that a critical section can never
> span a full grace period. In more detail, the Guarantee says:
>
> For any critical section C and any grace period G, at least
> one of the following statements must hold:
>
> (1) C ends before G does, and in addition, every store that
> propagates to C's CPU before the end of C must propagate to
> every CPU before G ends.
>
> (2) G starts before C does, and in addition, every store that
> propagates to G's CPU before the start of G must propagate
> to every CPU before C starts.

^ this

[2] https://docs.kernel.org/dev-tools/lkmm/docs/explanation.html

> [1]. https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.rst
> Qi

--
Cheers,
Harry / Hyeonggon