Re: [PATCH 3/4] percpu: implement partial chunk depopulation

From: Guenter Roeck
Date: Fri Jul 02 2021 - 16:28:23 EST


On 7/2/21 12:45 PM, Dennis Zhou wrote:
Hello,

On Fri, Jul 02, 2021 at 12:11:40PM -0700, Guenter Roeck wrote:
Hi,

On Mon, Apr 19, 2021 at 10:50:46PM +0000, Dennis Zhou wrote:
From: Roman Gushchin <guro@xxxxxx>

This patch implements partial depopulation of percpu chunks.

As of now, a chunk can be depopulated only as a part of the final
destruction, if there are no more outstanding allocations. However
to minimize a memory waste it might be useful to depopulate a
partially filed chunk, if a small number of outstanding allocations
prevents the chunk from being fully reclaimed.

This patch implements the following depopulation process: it scans
over the chunk pages, looks for a range of empty and populated pages
and performs the depopulation. To avoid races with new allocations,
the chunk is previously isolated. After the depopulation the chunk is
sidelined to a special list or freed. New allocations prefer using
active chunks to sidelined chunks. If a sidelined chunk is used, it is
reintegrated to the active lists.

The depopulation is scheduled on the free path if the chunk is all of
the following:
1) has more than 1/4 of total pages free and populated
2) the system has enough free percpu pages aside of this chunk
3) isn't the reserved chunk
4) isn't the first chunk
If it's already depopulated but got free populated pages, it's a good
target too. The chunk is moved to a special slot,
pcpu_to_depopulate_slot, chunk->isolated is set, and the balance work
item is scheduled. On isolation, these pages are removed from the
pcpu_nr_empty_pop_pages. It is constantly replaced to the
to_depopulate_slot when it meets these qualifications.

pcpu_reclaim_populated() iterates over the to_depopulate_slot until it
becomes empty. The depopulation is performed in the reverse direction to
keep populated pages close to the beginning. Depopulated chunks are
sidelined to preferentially avoid them for new allocations. When no
active chunk can suffice a new allocation, sidelined chunks are first
checked before creating a new chunk.

Signed-off-by: Roman Gushchin <guro@xxxxxx>
Co-developed-by: Dennis Zhou <dennis@xxxxxxxxxx>
Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx>

This patch results in a number of crashes and other odd behavior
when trying to boot mips images from Megasas controllers in qemu.
Sometimes the boot stalls, but I also see various crashes.
Some examples and bisect logs are attached.

Ah, this doesn't look good.. Do you have a reproducer I could use to
debug this?


I copied the relevant information to http://server.roeck-us.net/qemu/mips/.

run.sh - qemu command (I tried with qemu 6.0 and 4.2.1)
rootfs.ext2 - root file system
config - complete configuration
defconfig - shortened configuration
vmlinux - a crashing kernel image (v5.13-7637-g3dbdb38e2869, with above configuration)

Interestingly, the crash doesn't always happen at the same location, even
with the same image. Some memory corruption, maybe ?

Hope this helps. Please let me know if I can provide anything else.

Thanks,
Guenter