Re: [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
From: Chen Ridong
Date: Fri Dec 06 2024 - 05:02:54 EST
On 2024/12/6 16:24, Hugh Dickins wrote:
> On Fri, 6 Dec 2024, Chen Ridong wrote:
>> On 2024/12/6 13:33, Yu Zhao wrote:
>>> On Thu, Dec 5, 2024 at 6:45 PM Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:
>>>> From: Chen Ridong <chenridong@xxxxxxxxxx>
>>>>
>>>> In the `mem_cgroup_update_lru_size` function, the `lru_size` should be
>>>> updated by adding `nr_pages` regardless of whether `nr_pages` is greater
>>>> than 0 or less than 0. To simplify this function, add a check for
>>>> `nr_pages` == 0. When `nr_pages` is not equal to 0, perform the same
>>>> actions.
>>>>
>>>> Signed-off-by: Chen Ridong <chenridong@xxxxxxxxxx>
>>>
>>> NAK.
>>>
>>> The commit that added that clearly explains why it was done that way.
>
> Thanks Yu: I did spot this going by, but was indeed hoping that someone
> else would NAK it, with more politeness than I could muster at the time!
>
>>
>> Thank you for your reply.
>>
>> I have read the commit message for ca707239e8a7 ("mm: update_lru_size
>> warn and reset bad lru_size") before sending my patch. However, I did
>> not quite understand why we need to deal with the difference between
>> 'nr_pages > 0' and 'nr_pages < 0'.
>>
>>
>> The 'lru_zone_size' can only be modified in the
>> 'mem_cgroup_update_lru_size' function. Only subtracting 'nr_pages' or
>> adding 'nr_pages' in a way that causes an overflow can make the size < 0.
>>
>> For case 1, subtracting 'nr_pages', we should issue a warning if the
>> size goes below 0. For case 2, when adding 'nr_pages' results in an
>> overflow, there will be no warning and the size won't be reset to 0 the
>> first time it occurs . It might be that a warning will be issued the
>> next time 'mem_cgroup_update_lru_size' is called to modify the
>> 'lru_zone_size'. However, as the commit message said, "the first
>> occurrence is the most informative," and it seems we have missed that
>> first occurrence.
>>
>> As the commit message said: "and then the vast unsigned long size draws
>> page reclaim into a loop of repeatedly", I think that a warning should
>> be issued and 'lru_zone_size' should be reset whenever 'size < 0' occurs
>> for the first time, whether from subtracting or adding 'nr_pages'.
>
> One thing, not obvious, but important to understand, is that WARN_ONCE()
> only issues a warning the first time its condition is found true, but
> it returns the true or false of the condition every time. So lru_size
> is forced to 0 each time an inconsistency is detected.
>
Thank you for your explanation.
My patch does not change this logic.
> (But IIRC VM_WARN_ON_ONCE() does not behave in that useful way; or maybe
> I've got that macro name wrong - we have so many slightly differing.)
>
> Perhaps understanding that will help you to make better sense of the
> order of events in this function.
>
> Another thing to understand: it's called before adding folio to list,
> but after removing folio from list: when it can usefully compare whether
> the emptiness of the list correctly matches lru_size 0. It cannot do so
> when adding if you "simplify" it in the way that you did.
>
The commit b4536f0c829c ("mm, memcg: fix the active list aging for
lowmem requests when memcg is enabled") has removed the emptiness check.
What is meaningful is that we can determine whether the size is smaller
than 0 before adding `nr_pages`. However, as I mentioned, the
`lru_zone_size` can only be modified in the `mem_cgroup_update_lru_size`
function, which means that a warning must have been issued if the size
was smaller than 0 before adding `nr_pages` when `nr_pages` is greater
than 0. In this case, it will not issue a warning again.
Perhaps "when it can usefully compare whether the emptiness of the list
correctly matches lru_size 0" is not useful now.
> You might be focusing too much on the "size < 0" aspect of it, or you
> might be worrying more than I did about size growing larger and larger
> until it wraps to negative (not likely on 64-bit, unless corrupted).> I hope these remarks help, but you need to think through it again
> for yourself.
>
> Hugh
Thank you very much for your patience.
Best regards,
Ridong
>
>>
>> I am be grateful if you can explain more details, it has confused me for
>> a while. Thank you very much.
>>
>> Best regards,
>> Ridong