Re: Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem

From: Abel Wu
Date: Tue May 30 2023 - 05:58:49 EST

On 5/30/23 5:12 AM, Shakeel Butt wrote:

+linux-mm and cgroups

On Mon, May 29, 2023 at 07:58:45PM +0800, Abel Wu wrote:
Hi Shakeel, thanks for reviewing! And sorry for replying so late,
I was on a vocation :)

On 5/25/23 9:22 AM, Shakeel Butt wrote:
On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
For now __sk_mem_raise_allocated() mainly considers global socket
memory pressure and allows to raise if no global pressure observed,
including the sockets whose memcgs are in pressure, which might
result in longer memcg memstall.

So take net-memcg's pressure into consideration when allocating
socket memory to alleviate long tail latencies.

Signed-off-by: Abel Wu <wuyun.abel@xxxxxxxxxxxxx>

Hi Abel,

Have you seen any real world production issue which is fixed by this
patch or is it more of a fix after reading code?

The latter. But we do observe one common case in the production env
that p2p service, which mainly downloads container images, running
inside a container with tight memory limit can easily be throttled and
keep memstalled for a long period of time and sometimes even be OOM-
killed. This service shows burst usage of TCP memory and I think it
indeed needs suppressing sockmem allocation if memcg is already under
pressure. The memcg pressure is usually caused by too many page caches
and the dirty ones starting to be wrote back to slow backends. So it
is insane to continuously receive net data to consume more memory.

We actually made an intentional decision to not throttle the incoming
traffic under memory pressure. See 720ca52bcef22 ("net-memcg: avoid
stalls when under memory pressure"). If you think the throttling
behavior is preferred for your application, please propose the patch
separately and we can work on how to enable flexible policy here.

Ah I see. Thanks for providing the context. So suppressing the alloc
under memcg pressure could further keep senders waiting if SACKed segs
get dropped from the OFO queue.

This code is quite subtle and small changes can cause unintended
behavior changes. At the moment the tcp memory accounting and memcg
accounting is intermingled and I think we should decouple them.

My original intention to post this patchset is to clarify that:

- proto pressure only considers sysctl_mem[] (patch 2)
- memcg pressure only indicates the pressure inside itself
- consider both whenever needs allocation or reclaim (patch 1,3)

In this way, the two kinds of pressure maintain purer semantics, and
socket core can react on both of them properly and consistently.

Can you please resend you patch series (without patch 3) and Cc to
linux-mm, cgroups list and memcg maintainers as well?

Yeah, absolutely.