Re: Signal 7 and "Couldn't get a free page..."

Hubert Mantel (mantel@suse.de)
Wed, 30 Apr 1997 10:28:50 +0200 (MEST)


Hello,

On Tue, 29 Apr 1997, David S. Miller wrote:

> > I'm getting "Couldn't get a free page..." even on machines with
> > 32MB RAM and lots of unused swap. With 2.0.29 I never saw this on
> > the affected machines...
>
> I'm suspecting the buffer cache changes to be the real problem.
>
> Yes this is known, quick easy fix for this is to increase magic
> constant in free_area_init() initial comparison to 48, like so:
>
> /*
> * select nr of pages we try to keep free for important stuff
> * with a minimum of 48 pages. This is totally arbitrary
> */
> i = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT+7);
> if (i < 48)
> i = 48;
>
> This is what Linus and myself have in our trees.

This patch only helps a little with small machines. With 32MB RAM, you
have already 64 96 128, so this patch does nothing at all. And one gets
"Couldn't get a free page..." even on machines with 32MB RAM, so this
seems not to be a solution for the real problem. Increasing these values
only makes the problem come up not so often, but it doesn't disappear.

Ingo Molnar sent me the attached patch for testing purposes. My test
machine did run the whole night compiling kernels. The problem (failure of
__get_free_page) was triggered about once per hour. But apparently,
calling schedule() did solve the problem, as the count never went below
100. This is probably not the correct solution, but it might show the
right way where to look...

> David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

Hubert mantel@suse.de

The test patch:
-------------------------------------------------------------------------
diff -urN linux-2.0.30/mm/memory.c linux-2.0.30-test/mm/memory.c
--- linux-2.0.30/mm/memory.c Wed Sep 11 16:57:19 1996
+++ linux-2.0.30-test/mm/memory.c Tue Apr 29 19:33:23 1997
@@ -927,7 +927,20 @@
anonymous_page:
entry = pte_wrprotect(mk_pte(ZERO_PAGE, vma->vm_page_prot));
if (write_access) {
- unsigned long page = __get_free_page(GFP_KERNEL);
+ /*
+ * this is a totally incorrect patch, as the problem
+ * is elsewhere
+ */
+ int count=100;
+ unsigned long page;
+repeat:
+ page = __get_free_page(GFP_KERNEL);
+ if (!page && count) {
+ printk ("Ingo's Patch was triggered with count = %d\n", count);
+ count--;
+ schedule();
+ goto repeat;
+ }
if (!page)
goto sigbus;
memset((void *) page, 0, PAGE_SIZE);
-------------------------------------------------------------------------

The syslog:
-------------------------------------------------------------------------
Apr 29 21:24:37 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 29 21:44:21 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 29 21:49:13 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 29 22:21:57 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 29 23:03:00 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 29 23:55:14 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 30 01:12:13 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 30 01:43:00 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 30 02:35:22 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 30 04:15:10 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 30 05:42:11 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 30 06:11:29 Celsius kernel: Ingo's Patch was triggered with count = 100
Apr 30 06:21:57 Celsius kernel: Ingo's Patch was triggered with count = 100
-------------------------------------------------------------------------