Re: [patch] my latest oom stuff

Andrea Arcangeli (andrea@e-mind.com)
Tue, 27 Oct 1998 09:50:32 +0100 (CET)


On Mon, 26 Oct 1998, Linus Torvalds wrote:

>I just found a case that could certainly result in endless page faults,
>and an endless stream of __get_free_page() calls. It's been there forever,

Cool ;-). This could be the cause of the total not recoverable deadlock I
had when yesterday I caused my machine to reach 0k of RAM and 0k of SWAP.
Thinking about it a bit more I think that the kernel should be always able
to recover from such total-real OOM too.

>I've not seen this behaviour myself, but it could have caused Andrea's
>problems, especially the harder to find ones. Andrea, can you check this
>patch (against clean 2.1.126) out and see if it makes any difference to
>your testing?

It' s compiling right now. I' ve seen the patch and I think that my
implementation of kswapd is superior.

[booted pre-2.1.127-2]

No it still total deadlock when the system has still 1Mbyte free. kswapd
run all the time. I think it' s because you set always ->counter =
high_prio while waking up kswapd. Note also that if via sysrq I send a Sak
or a Kill all (except init) nothing happens because no one process except
kswapd has a chance to get scheduled and kswapd continue to run all the
time.

I ported my latest kswapd that works great here to your pre-2.1.127-2. At
least kswapd will be fixed fine with my patch (so when I press Sak via
sysrq I can recover from the OOM).

The __get_free_pages()/try_to_free_pages() issue is equally important to
avoid the deadlock but is complety disconnected by the kswapd issue. The
problem with the current __get_free_pages() is that we go ahead if
__GFP_HIGH or MID are set (pratically if we are in GFP_ATOMIC or
GFP_KERNEL context, that seems the only/everywhere used GFP_*). Should we
recall __get_free_pages(GFP_USER) when we need a page for an anonymous
mapping or for a swapin?

Here my new kswapd against pre-2.1.127-2. Tried it and kswapd works
properly now (and now I can restore the system fine via SysRQ).

Index: mm/vmscan.c
===================================================================
RCS file: /var/cvs/linux/mm/vmscan.c,v
retrieving revision 1.1.1.5
diff -u -r1.1.1.5 vmscan.c
--- vmscan.c 1998/10/27 07:42:59 1.1.1.5
+++ linux/mm/vmscan.c 1998/10/27 08:19:30
@@ -495,6 +495,30 @@
printk ("Starting kswapd v%.*s\n", i, s);
}

+#define kswapd_renice(freemem) \
+ (kswapd_task->priority = kswapd_priority(freemem))
+
+#define kswapd_done(freemem) \
+ (freemem == 2 && buffer_under_max() && pgcache_under_max())
+
+#define kswapd_schedule() \
+ if (kswapd_task->need_resched) \
+ schedule();
+
+static void kswapd_engine(void)
+{
+ for (;;)
+ {
+ int free_memory;
+ do_try_to_free_page(0);
+ free_memory = free_memory_available();
+ if (kswapd_done(free_memory))
+ break;
+ kswapd_renice(free_memory);
+ kswapd_schedule();
+ }
+}
+
/*
* The background pageout daemon.
* Started as a kernel thread from the init process.
@@ -514,13 +538,6 @@
lock_kernel();

/*
- * Set the base priority to something smaller than a
- * regular process. We will scale up the priority
- * dynamically depending on how much memory we need.
- */
- current->priority = (DEF_PRIORITY * 2) / 3;
-
- /*
* Tell the memory management that we're a "memory allocator",
* and that if we need more memory we should get access to it
* regardless (see "try_to_free_pages()"). "kswapd" should
@@ -537,20 +554,16 @@
init_swap_timer();
kswapd_task = current;
while (1) {
- unsigned long start_time;
-
- current->state = TASK_INTERRUPTIBLE;
+/* run_task_queue(&tq_disk); */
flush_signals(current);
- run_task_queue(&tq_disk);
+ /*
+ * Remeber to enable up the swap tick before go to sleep.
+ */
+ timer_active |= 1<<SWAP_TIMER;
+ current->state = TASK_INTERRUPTIBLE;
schedule();
swapstats.wakeups++;
-
- start_time = jiffies;
- do {
- do_try_to_free_page(0);
- if (free_memory_available() > 1)
- break;
- } while (jiffies != start_time);
+ kswapd_engine();
}
/* As if we could ever get here - maybe we want to make this killable */
kswapd_task = NULL;
@@ -587,59 +600,21 @@
return retval;
}

-/*
- * Wake up kswapd according to the priority
- * 0 - no wakeup
- * 1 - wake up as a low-priority process
- * 2 - wake up as a normal process
- * 3 - wake up as an almost real-time process
- *
- * This plays mind-games with the "goodness()"
- * function in kernel/sched.c.
- */
-static inline void kswapd_wakeup(int priority)
-{
- if (priority) {
- struct task_struct *p = kswapd_task;
- if (p) {
- p->counter = p->priority << priority;
- wake_up_process(p);
- }
- }
-}
-
/*
* The swap_tick function gets called on every clock tick.
*/
void swap_tick(void)
{
- unsigned int pages;
- int want_wakeup;
-
+ int free_memory = free_memory_available();
/*
* Schedule for wakeup if there isn't lots
* of free memory or if there is too much
* of it used for buffers or pgcache.
- *
- * "want_wakeup" is our priority: 0 means
- * not to wake anything up, while 3 means
- * that we'd better give kswapd a realtime
- * priority.
*/
- want_wakeup = 0;
- if (buffer_over_max() || pgcache_over_max())
- want_wakeup = 1;
- pages = nr_free_pages;
- if (pages < freepages.high)
- want_wakeup = 1;
- if (pages < freepages.low)
- want_wakeup = 2;
- if (pages < freepages.min)
- want_wakeup = 3;
-
- kswapd_wakeup(want_wakeup);
-
- timer_active |= (1<<SWAP_TIMER);
+ if (free_memory != 2 || buffer_over_max() || pgcache_over_max())
+ kswapd_wakeup(free_memory);
+ else
+ timer_active |= (1<<SWAP_TIMER);
}

/*
Index: include/linux/mm.h
===================================================================
RCS file: /var/cvs/linux/include/linux/mm.h,v
retrieving revision 1.1.1.4
diff -u -r1.1.1.4 mm.h
--- mm.h 1998/10/27 07:42:43 1.1.1.4
+++ linux/include/linux/mm.h 1998/10/27 08:20:37
@@ -329,10 +329,33 @@
*/
extern int free_memory_available(void);
extern struct task_struct * kswapd_task;
-#define wakeup_kswapd() do { \
- if (kswapd_task->state & TASK_INTERRUPTIBLE) \
- wake_up_process(kswapd_task); \
-} while (0)
+
+static inline long kswapd_priority(int free_memory)
+{
+ long priority;
+ switch (free_memory)
+ {
+ case 0:
+ priority = DEF_PRIORITY << 1;
+ break;
+ case 2:
+ priority = DEF_PRIORITY >> 1;
+ break;
+ default:
+ priority = DEF_PRIORITY;
+ }
+ return priority;
+}
+
+static inline void kswapd_wakeup(int free_memory)
+{
+ struct task_struct *p = kswapd_task;
+ if (p)
+ {
+ p->priority = kswapd_priority(free_memory);
+ wake_up_process(p);
+ }
+}

/* vma is the first one with address < vma->vm_end,
* and even address < vma->vm_start. Have to extend vma. */

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/