[PATCH v2 0/2] Fix I/O high when memory almost met memcg limit

From: Liu Shixin
Date: Fri Mar 22 2024 - 05:36:30 EST


v1->v2:
1. Replace the variable active_refault with mmap_miss. Now mmap_miss will
not decreased if folio is active prior to eviction.
2. Jan has given me other two patches which aims to let mmap_miss properly
increased when the page is not ready. But in my scenario, the problem
is that the page will be reclaimed immediately. These two patches have
no logic conflict with Jan's patches[3].

Recently, when install package in a docker which almost reached its memory
limit, the installer has no respond severely for more than 15 minutes.
During this period, I/O stays high(~1G/s) and influence the whole machine.
I've constructed a use case as follows:

1. create a docker:

$ cat test.sh
#!/bin/bash

docker rm centos7 --force

docker create --name centos7 --memory 4G --memory-swap 6G centos:7 /usr/sbin/init
docker start centos7
sleep 1

docker cp ./alloc_page centos7:/
docker cp ./reproduce.sh centos7:/

docker exec -it centos7 /bin/bash

2. try reproduce the problem in docker:

$ cat reproduce.sh
#!/bin/bash

while true; do
flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l)
if [ "$flag" -eq 0 ]; then
/alloc_page &
fi

sleep 30

start_time=$(date +%s)
yum install -y expect > /dev/null 2>&1

end_time=$(date +%s)

elapsed_time=$((end_time - start_time))

echo "$elapsed_time seconds"
yum remove -y expect > /dev/null 2>&1
done

$ cat alloc_page.c:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define SIZE 1*1024*1024 //1M

int main()
{
void *addr = NULL;
int i;

for (i = 0; i < 1024 * 6 - 50;i++) {
addr = (void *)malloc(SIZE);
if (!addr)
return -1;

memset(addr, 0, SIZE);
}

sleep(99999);
return 0;
}


We found that this problem is caused by a lot ot meaningless read-ahead.
Since the docker is almost met memory limit, the page will be reclaimed
immediately after read-ahead and will read-ahead again immediately.
The program is executed slowly and waste a lot of I/O resource.

These two patch aim to break the read-ahead in above scenario.

[1] https://lore.kernel.org/linux-mm/c2f4a2fa-3bde-72ce-66f5-db81a373fdbc@xxxxxxxxxx/T/
[2] https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@xxxxxxxxxx/
[3] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/

Liu Shixin (2):
mm/readahead: break read-ahead loop if filemap_add_folio return
-ENOMEM
mm/readahead: increase mmap_miss when folio in workingset

include/linux/pagemap.h | 2 ++
mm/filemap.c | 7 ++++---
mm/readahead.c | 15 +++++++++++++--
3 files changed, 19 insertions(+), 5 deletions(-)

--
2.25.1