Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30

From: Nick Piggin
Date: Mon Jul 26 2004 - 23:09:26 EST

Next message: Ram Pai: "Re: [PATCH] fix readahead breakage for sequential after randomreads"
Previous message: Joel Becker: "Re: Autotune swappiness01"
In reply to: Daniel Phillips: "Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30"
Next in thread: Daniel Phillips: "Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Daniel Phillips wrote:

On Monday 12 July 2004 22:31, Nick Piggin wrote:

Search for rt_task in mm/page_alloc.c

Ah, interesting idea: realtime tasks get to dip into the PF_MEMALLOC reserve, until it gets down to some threshold, then they have to give up and wait like any other unwashed nobody of a process. _But_ if there's a user space process sitting in the writeout path and some other realtime process eats the entire realtime reserve, everything can still grind to a halt.

So it's interesting for realtime, but does not solve the userspace PF_MEMALLOC inversion.

Not the rt_task thing, because yes, you can have other RT tasks that aren't
small and bounded that screw up your reserves.

But a PF_MEMALLOC userspace task is still useful.

A privileged syscall which allows a task to mark itself as one which
cleans memory would make sense.

For now we can do it with an ioctl, and we pretty much have to do it for
pvmove. But that's when user space drives the kernel by syscalls; there
is also the nasty (and common) case where the kernel needs userspace to
do something for it while it's in PF_MEMALLOC. I'm playing with ideas
there, but nothing I'm proud of yet. For now I see the in-kernel
approach as the conservative one, for anything that could possibly find
itself on the VM writeout path.

You'd obviously want to make the PF_MEMALLOC task as tight as possible,
and running mlocked:

Not just tight, but bounded. And tight too, of course.

I don't particularly see why such a task would be any safer in-kernel.

The PF_MEMALLOC flag is inherited down a call chain, not across a pipe or similar IPC to user space.

This is no different in kernel of course. You would have to think about
which threads need the flag and which do not. Even better, you might
aquire and drop the flag only when required. I can't see any obvious
problems you would run into.

PF_MEMALLOC tasks won't enter page reclaim at all. The only way they
will reach the writeout path is if you are write(2)ing stuff (you may
hit synch writeout).

That's the problem.

Well I don't think it would be a problem to get the write throttling path
to ignore PF_MEMALLOC tasks if that is what you need. Again, this shouldn't
be any different to in kernel code.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ram Pai: "Re: [PATCH] fix readahead breakage for sequential after randomreads"
Previous message: Joel Becker: "Re: Autotune swappiness01"
In reply to: Daniel Phillips: "Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30"
Next in thread: Daniel Phillips: "Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]