Re: Swapping-bug

Amnon Shiloh (amnons@cs.huji.ac.il)
Thu, 12 Nov 1998 19:32:35 +0200 (IST)


On Sun, 1 Nov 1998, Andrea Arcangeli wrote:

> Subject: Re: Swapping-bug
>
> On Sun, 1 Nov 1998, Ariel Rosenblatt wrote:
>
> >If a process that requires more memory than available and therefore causes
> >many of its pages to be swapped (could be around half of the total memory),
> >terminates or releases its memory while its pages are still scheduled for
> >delayed-write to the disk, all those pages will remain in the swap_cache
> >(and even with a high "age" value), so that although nobody needs those
> >pages, which should go to the free-list, they hang around there
> >indefinitely until other processes run out of memory.ry.free memory.
> >When that eventually happens, only then those pages begin to age, while
> >good pages are stolen from other processes in the meanwhile.
>
> In the latest 2.1 kernels swap cache pages are not aged anymore (just
> because the problem you are reporting).
>

Indeed, the 2.1.126 fixes reduced the severity of the problem,
because instead of being aged, those pages are immediately removed
from the swap-cache.

There still remains a smaller problem, because "kswapd" (or any other
process running "do_try_to_free_page()") may unnecessarily free other
used-pages and/or resources ("shm_swap", "swap_out", "shrink_dcache_memory",
or even "kmem_cache_reap") before the clock of "shrink_mmap" reaches the
pages in question. In fact, there may actually be plenty of available memory,
in which case "do_try_to_free_page" should not have been called in the first
place.

> >The reason is that while pages are scheduled to be written to disk, their
>
> No this should be not the reason. The reason is that if the swap cache
> page is shared not in memory but in the swap, the swap cache page remains
> there when the process will die.

A small, trivial, test-program that obviously uses no page-sharing,
proves that this can even happen when the pages were never shared:

char m[80*1024*1024]; /* anything >= size of the main memory */
main()
{
register i, j;

for(j = 0 ; j < 2 ; j++)
for(i = 0 ; i < sizeof(m) ; i += 4096)
m[i]++;
}

After running this program on an otherwise idle computer, there would be
lots of pages in the swap-cache with a count of 1 (just due to being in the
swap-cache). Those pages really belong on the free-list.
If instead, we allow some while for the pages to be written to the hard-disk
before the process exits, then no such pages are found.
This can be shown by adding a "sleep" at the end:

char m[80*1024*1024]; /* anything >= size of the main memory */
main()
{
register i, j;

for(j = 0 ; j < 2 ; j++)
for(i = 0 ; i < sizeof(m) ; i += 4096)
m[i]++;
sleep(6);
}

The pages in question are totally unused, no longer needed and can be
easily freed once their count drops to 1 as the page arrives at the disk.

The following simple fix in "fs/buffer.c" was tested and shown to solve the
problem. (as the comment that is longer than the fix itself states, we had
to be very careful with regard to races on SMP machines):
---------------------------------------------------------

/* Run the hooks that have to be done when a page I/O has completed. */
static inline void after_unlock_page (struct page * page)
{
if (test_and_clear_bit(PG_decr_after, &page->flags)) {
atomic_dec(&nr_async_pages);
#ifdef DEBUG_SWAP
printk ("DebugVM: Finished IO on page %p, nr_async_pages %d\n",
(char *) page_address(page),
atomic_read(&nr_async_pages));
#endif
}
if (test_and_clear_bit(PG_swap_unlock_after, &page->flags))
+ {
+ /* When a SwapCache page remains with only one reference,
+ * eg. just for being on the swap-cache, it should be freed.
+ * We like to do this below, after freeing the page,
+ * but once the page is unlocked, there can be a race on
+ * an SMP, causing "kswapd" to crash while deleting the
+ * page from swap cache simultaneously with this
+ * interrupt-routine, so we must do it before unlocking.
+ */
+ if(PageFreeAfter(page) && PageSwapCache(page) &&
+ atomic_read(&page->count) == 2)
+ delete_from_swap_cache(page);
swap_after_unlock_page(page->offset);
+ }
if (test_and_clear_bit(PG_free_after, &page->flags))
__free_page(page);
}

---------------------------------------------------------------
Amnon Shiloh - the MOSIX group, Hebrew University of Jerusalem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/