RE: How to read-protect a vm_area?

Noel Burton-Krahn (noel@harleystreet.com)
Thu, 19 Feb 1998 16:03:41 -0800


Will the changes you are proposing allow me to determine the set of all
ptes (in all tasks) which reference a page in memory?

I am working on something like distributed shared memory where a page
shared by many processes may be invalidated without a local process writing
to it. To keep everything up-to-date I have to invalidate all local ptes
which point to such a page. It sure would be nice if your changes made
this possible.

--Noel

-----Original Message-----
From: Stephen C. Tweedie [SMTP:sct@dcs.ed.ac.uk]
Sent: Thursday, February 19, 1998 3:22 PM
To: Rik van Riel
Cc: Linus Torvalds; Itai Nahshon; Alan Cox; paubert@iram.es;
linux-kernel@vger.rutgers.edu; Stephen Tweedie; Benjamin LaHaise; Ingo
Molnar
Subject: Re: How to read-protect a vm_area?

On Tue, 17 Feb 1998 10:28:36 +0100 (MET), Rik van Riel
<H.H.vanRiel@fys.ruu.nl> said:

> On Mon, 16 Feb 1998, Linus Torvalds wrote:
>> On Tue, 17 Feb 1998, Itai Nahshon wrote:
>> > When free memory became low the pager started to mark pages as
>> > "not present" for the hardware (but not swap them out yet), and
>> > put these pages on the free page list.
>>
>> Linux does some similar tricks with the dirty bits and the "swap
>> cache". Not the same, but the basic ideas are fairly similar.

> It would be _very_ nice to do the 'inactive-list-thingy'
> with Linux too.

I'm working on it, and expect to have more code to show very shortly.

The first round --- the diffs in 2.1.79 --- unified the page cache and
swap cache, with the intention that it will allow us to cache pages
which have been cleaned and (optionally) written to swap, but which
have not yet been actually removed from physical memory (they have
only been removed from process page tables).

The current state of play is that I have a patch which appears
reliable and which uses the swap cache for both read and write,
allowing us to swap in and out a page which is shared by multiple
processes. The swap cache mechanism is used to ensure that if any one
process swaps back in one of these shared pages, all of the other
processes can find that page in physical memory even though their ptes
still point to a swap entry on disk.

I've been testing this with a stresser program which allocates and
initialises to a known patter a large area of memory, and then forks
multiple processes which test that memory randomly. If you use a much
larger memory area than there is free physical memory, then it gets
swapped out pretty rapidly. With the current patch, I can do this
will multiple concurrent processes sharing the same page, and only one
page of swap is used per page of shared heap (on older kernels, you'd
get separate swap space allocated for each process in the group).
I've also tested with other forked processes writing randomly to the
same heap --- you can see the copy-on-write forcing the allocated swap
to grow slowly as the writer processes gradually diverge their working
set.

I'll release the patches as soon as I've brought them up to date with
the small vm changes in 2.1.87 (I'm using .85) and finished updating
some of the locking code: given that I'm now using a unique swap cache
page for any IO to a swap file, having a separate swap lock map is
redundant and in fact complicates some of the code unnecessarily.

The current code already includes an optional nowait parameter when
reading swap pages, so we can truly swapin asynchronously if we want
to. The next step after this lot is released will be to do the
MAP_ANONYMOUS|MAP_SHARED code: the current code already handles the
hard case where we have both MAP_SHARED and (copy-on-write) shared
MAP_PRIVATE pages in the same vma region, so dealing with vmas where
all pages are known to be shared will be a relatively easy step
forward.

Once I've done that the plan is to make the page cache inode linkage
structures SMP- and IRQ-safe. Once that is done, we can maintain
exactly what you describe --- an inactive list of page cache entries
(now including swap cache pages) which can be held in memory but which
we can allocate from rapidly even from within an interrupt. This will
help to minimise the amount of truly free memory that we need to keep
around; we'll be able to do write-ahead to swap but keep the pages
cached for future use without discarding them unnecessarily from
memory just to keep space for atomic memory allocations.

> Linus' patch makes most of this possible, but how do I do the
> pagecache swap-cache thingy?

Please do let me know if you want to start hacking around with this
code --- we probably want to coordinate with some of the other VM
things happening at the moment (in particular, things like Ingo's swap
prediction and the dirty page caching suggestions).

Cheers,
Stephen.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu