Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng

From: Colin King (gmail)
Date: Thu Aug 03 2023 - 10:37:43 EST


Hi Aaron,

Thanks for the speedy fix. I've tested this for a couple of 10 minute soak test and can't reproduce the issue with the fix, so it looks good to me, so please add:

Tested-by: Colin Ian King <colin.i.king@xxxxxxxxx>

Colin

On 03/08/2023 14:41, Aaron Lu wrote:
On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote:
On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
Hi,

I notice a bug report on Bugzilla [1]. Quoting from it:

How to reproduce:

Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.

using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
(see https://github.com/ColinIanKing/stress-ng )

build and run stress-ng for say 60 minutes:

./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m

Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0

See attached file for an image of the console on the hang (I'm trying to get the full stack dump).

See Bugzilla for the full thread and attached console image.

FWIW, I have to forward this bug report to the mailing lists because
Thorsten noted that many developers don't take a look on Bugzilla
(see the BZ thread).

Thanks.

I can reproduce this issue using below cmdline:
$ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m

I'll investigate what is happening.

Hi Colin,

Can you try the below diff on top of v6.5-rc4? It works for me here
although I got the warn in a different place in get_swap_pages():

WARN(!si->highest_bit,
"swap_info %d in list but !highest_bit\n",
si->type);

I think the warn you got in add_to_avail_list() due to the swap device
is already in the list is similar, see below explanation.

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8e6dde68b389..cb7e93ec1933 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
* swap_info_struct.
*/
plist_add(&p->list, &swap_active_head);
- add_to_avail_list(p);
+ if (p->highest_bit)
+ add_to_avail_list(p);
}
static void enable_swap_info(struct swap_info_struct *p, int prio,

The finding is, if a swap device failed to be swapoff, then it will be
reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The
problem is, this swap device may run out of space with its highest_bit
being 0 and shouldn't be added to avail list. In your case, once its
highest_bit becomes non-zero, it will go through add_to_avail_list()
and since it's already in the list, thus the warn.

If it works for you, I'll prepare a patch. Thanks.