Re: [RFC 2/2] x86_64: expand kernel stack to 16K

From: Minchan Kim
Date: Fri May 30 2014 - 02:20:41 EST

Next message: Magnus Damm: "Re: [PATCH v2 04/05] staging: board: Initial board staging support"
Previous message: Huang Shijie: "Re: [PATCH 2/2] serial: imx: disable the receiver ready interrupt for imx_stop_rx"
In reply to: Linus Torvalds: "Re: [RFC 2/2] x86_64: expand kernel stack to 16K"
Next in thread: Linus Torvalds: "Re: [RFC 2/2] x86_64: expand kernel stack to 16K"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, May 29, 2014 at 06:24:02PM -0700, Linus Torvalds wrote:
> On Thu, May 29, 2014 at 5:50 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
> >>
> >> You could also try Dave's patch, and _not_ do my mm/vmscan.c part.
> >
> > Sure. While I write this, Rusty's test was crached so I will try Dave's patch,
> > them yours except vmscan.c part.
>
> Looking more at Dave's patch (well, description), I don't think there
> is any way in hell we can ever apply it. If I read it right, it will
> cause all IO that overflows the max request count to go through the
> scheduler to get it flushed. Maybe I misread it, but that's definitely
> not acceptable. Maybe it's not noticeable with a slow rotational
> device, but modern ssd hardware? No way.
>
> I'd *much* rather slow down the swap side. Not "real IO". So I think
> my mm/vmscan.c patch is preferable (but yes, it might require some
> work to make kswapd do better).
>
> So you can try Dave's patch just to see what it does for stack depth,
> but other than that it looks unacceptable unless I misread things.
>
> Linus

I tested below patch and the result is endless OOM although there are
lots of anon pages and empty space of swap.

I guess __alloc_pages_direct_reclaim couldn't proceed due to anon pages
once VM drop most of file-backed pages, then go to OOM.

---
mm/backing-dev.c | 25 +++++++++++++++----------
mm/vmscan.c | 4 +---
2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index ce682f7a4f29..2762b16404bd 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -11,6 +11,7 @@
#include <linux/writeback.h>
#include <linux/device.h>
#include <trace/events/writeback.h>
+#include <linux/blkdev.h>

static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);

@@ -565,6 +566,18 @@ void set_bdi_congested(struct backing_dev_info *bdi, int sync)
}
EXPORT_SYMBOL(set_bdi_congested);

+static long congestion_timeout(int sync, long timeout)
+{
+ long ret;
+ DEFINE_WAIT(wait);
+
+ wait_queue_head_t *wqh = &congestion_wqh[sync];
+ prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+ ret = schedule_timeout(timeout);
+ finish_wait(wqh, &wait);
+ return ret;
+}
+
/**
* congestion_wait - wait for a backing_dev to become uncongested
* @sync: SYNC or ASYNC IO
@@ -578,12 +591,8 @@ long congestion_wait(int sync, long timeout)
{
long ret;
unsigned long start = jiffies;
- DEFINE_WAIT(wait);
- wait_queue_head_t *wqh = &congestion_wqh[sync];

- prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
- ret = io_schedule_timeout(timeout);
- finish_wait(wqh, &wait);
+ ret = congestion_timeout(sync,timeout);

trace_writeback_congestion_wait(jiffies_to_usecs(timeout),
jiffies_to_usecs(jiffies - start));
@@ -614,8 +623,6 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
{
long ret;
unsigned long start = jiffies;
- DEFINE_WAIT(wait);
- wait_queue_head_t *wqh = &congestion_wqh[sync];

/*
* If there is no congestion, or heavy congestion is not being
@@ -635,9 +642,7 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
}

/* Sleep until uncongested or a write happens */
- prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
- ret = io_schedule_timeout(timeout);
- finish_wait(wqh, &wait);
+ ret = congestion_timeout(sync, timeout);

out:
trace_writeback_wait_iff_congested(jiffies_to_usecs(timeout),
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a9c74b409681..e4ad7cd1885b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -975,9 +975,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
* avoid risk of stack overflow but only writeback
* if many dirty pages have been encountered.
*/
- if (page_is_file_cache(page) &&
- (!current_is_kswapd() ||
- !zone_is_reclaim_dirty(zone))) {
+ if (!current_is_kswapd() || !zone_is_reclaim_dirty(zone)) {
/*
* Immediately reclaim when written back.
* Similar in principal to deactivate_page()
--
1.9.2

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Magnus Damm: "Re: [PATCH v2 04/05] staging: board: Initial board staging support"
Previous message: Huang Shijie: "Re: [PATCH 2/2] serial: imx: disable the receiver ready interrupt for imx_stop_rx"
In reply to: Linus Torvalds: "Re: [RFC 2/2] x86_64: expand kernel stack to 16K"
Next in thread: Linus Torvalds: "Re: [RFC 2/2] x86_64: expand kernel stack to 16K"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]