Re: [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state()
From: Harry Yoo (Oracle)
Date: Wed Mar 25 2026 - 03:42:39 EST
On Wed, Mar 25, 2026 at 03:26:46PM +0800, Qi Zheng wrote:
>
>
> On 3/25/26 1:17 PM, Harry Yoo (Oracle) wrote:
> > On Wed, Mar 25, 2026 at 11:25:06AM +0800, Qi Zheng wrote:
> > > On 3/25/26 9:43 AM, Harry Yoo (Oracle) wrote:
> > > > On Tue, Mar 24, 2026 at 07:31:28PM +0800, Qi Zheng wrote:
> > > > > From: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
> > > > >
> > > > > The __mod_memcg_state() and __mod_memcg_lruvec_state() were used to
> > > > > reparent non-hierarchical stats, the values passed to them might exceed
> > > > > the upper limit of the type int, so correct the val parameter type of them
> > > > > to long.
> > > > >
> > > > > Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
> > > > > ---
> > > > > include/trace/events/memcg.h | 10 +++++-----
> > > > > mm/memcontrol.c | 8 ++++----
> > > > > 2 files changed, 9 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > index 7fb9cbc10dfbb..4a78550f6174e 100644
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -527,7 +527,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
> > > > > #ifdef CONFIG_MEMCG_V1
> > > > > static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
> > > > > - enum node_stat_item idx, int val);
> > > > > + enum node_stat_item idx, long val);
> > > > > void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
> > > > > struct mem_cgroup *parent, int idx)
> > > > > @@ -784,7 +784,7 @@ static int memcg_page_state_unit(int item);
> > > > > * Normalize the value passed into memcg_rstat_updated() to be in pages. Round
> > > > > * up non-zero sub-page updates to 1 page as zero page updates are ignored.
> > > > > */
> > > > > -static int memcg_state_val_in_pages(int idx, int val)
> > > > > +static long memcg_state_val_in_pages(int idx, long val)
> > > > > {
> > > > > int unit = memcg_page_state_unit(idx);
> > > >
> > > > Sashiko AI made an interesting argument [1] that this could lead to
> > > > incorrectly returning a very large positive number. Let me verify that.
> > > >
> > > > [1] https://sashiko.dev/#/patchset/cover.1774342371.git.zhengqi.arch%40bytedance.com
> > > >
> > > > Sashiko wrote:
> > > > > Does this change inadvertently break the handling of negative byte-sized
> > > > > updates?
> > > > > Looking at the rest of the function:
> > > > > if (!val || unit == PAGE_SIZE)
> > > > > return val;
> > > > > else
> > > > > return max(val * unit / PAGE_SIZE, 1UL);
> > > >
> > > > > PAGE_SIZE is defined as an unsigned long.
> > > >
> > > > Right, it's defined as 1UL << PAGE_SHIFT.
> > > >
> > > > > When val is negative, such as during uncharging of byte-sized stats like
> > > > > MEMCG_ZSWAP_B, the expression val * unit is a negative long.
> > > >
> > > > Right.
> > > >
> > > > > Dividing a signed long by an unsigned long causes the signed long to be
> > > > > promoted to unsigned before division,
> > > >
> > > > Right.
> > > >
> > > > > resulting in a massive positive
> > > > > number instead of a small negative one.
> > > >
> > > > Let's look at an example (assuming unit is 1).
> > > >
> > > > val = val * unit = -16384 (-16 KiB)
> > > > val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
> > > > max(0x3FFFFFFFFFFFFF, 1UL) = 0x3FFFFFFFFFF
> > > >
> > > > Yeah, that's a massive positive number.
> > > >
> > > > Hmm but how did it work when it was int?
> > > >
> > > > val = val * unit = -16384 (-16KiB)
> > > > val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
> > > > max(val * unit / PAGE_SIZE, 1UL) = 0x3FFFFFFFFFFFFF
> > > > (int)0x3FFFFFFFFFFFFF = 0xFFFFFFFF = (-1)
> > > >
> > > > That's incorrect. It should have been -4?
> > > >
> > > > > Before this change, the function returned an int, which implicitly truncated
> > > > > the massive unsigned 64-bit result to a 32-bit int, accidentally yielding the
> > > > > correct negative arithmetic value.
> > > >
> > > > So... "accidentally yielding the correct negative arithemetic value"
> > > > is wrong.
> > > >
> > > > Sounds like it's been subtly broken even before this patch and nobody
> > > > noticed.
> > >
> > > Thank you for such a detailed analysis! And I think you are right.
> >
> > No problem ;)
> >
> > > The memcg_state_val_in_pages() is only to make @val to be in pages, so
> > > perhaps we can avoid the above problem by taking the absolute value
> > > first?
>
> For more clearly, here is the diff:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6491ca74b3398..2b34805b01476 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -787,11 +787,17 @@ static int memcg_page_state_unit(int item);
> static long memcg_state_val_in_pages(int idx, long val)
> {
> int unit = memcg_page_state_unit(idx);
> + unsigned long uval;
> + long res;
>
> if (!val || unit == PAGE_SIZE)
> return val;
> - else
> - return max(val * unit / PAGE_SIZE, 1UL);
> +
> + uval = val < 0 ? -(unsigned long)val : (unsigned long)val;
We could use abs() macro in linux/math.h here :)
val should not be LONG_MIN, but it should be fine
to assume it won't reach LONG_MIN.
> +
> + res = max(mult_frac(uval, unit, PAGE_SIZE), 1UL);
> +
> + return val < 0 ? -res : res;
> }
>
> #ifdef CONFIG_MEMCG_V1
>
> > That would work for memcg_rstat_updated(),
> > but not for trace_mod_memcg_state()?
>
> Why? The trace_mod_memcg_state() seems to accept 'long' type,
> am I missing something?
Nevermind. Without the diff, I misunderstood what you meant.
Thanks!
--
Cheers,
Harry / Hyeonggon