Re: swap cache

Prasun Kapoor (prasun@wipinfo.soft.net)
Mon, 21 Dec 1998 19:46:55 -0500 (GMT)


hi,

> A pte for a page which is not present (ie. which is on swap) is not
> RO; the CPU context bits in the pte are not used if the page-present
> bit is clear. A pte for a swapped-out page is not readonly: it can't
> be, because the RO bit is not defined in that case.

Yep!

I guess this was the crux of my doubt. Also the VM description you have
given below is most interesting.

thanks,
prasun

>
> > In this context what exactly is meant by loading a page in RO mode, when
> > the PTE for it is already marked RO?
>
> What I mean is that when you take a page fault to such a swapped page,
> the page is loaded into memory but its pte is marked RO.
>
> There is a short-cut here: if the page fault was for write access, and
> the swap page is unshared (no other processes refer to the same swap
> location), then we bypass this mechanism and give the process
> immediate RW access to the page.
>
> In all other cases, the process gets RO access, even if the page fault
> was a write fault: a write access to a shared page not yet in physical
> memorythus brings two independent copies of the page into ram; one for
> the swap cache (where other processes can find it, thus taking
> advantage of the expensive disk IO we've just done), and the other for
> the faulting process's private copy of the data.
>
> > Ummmm I dont think I'd agree. UNIXWARE, and a few other folks do just that
> > with a linked list in the page structure. Anyhow its difficult to comment
> > without looking at linux implementation about merits and demerits of
> > maintaining all accessors to a page. Its sounds like quite a deviation
> > from the popular SVR4 design!
>
> It is, very much so. That traditional page table design is a very
> low-level feature of the VM, integrated early on in the design and
> hard to change once implemented, so it's not surprising that most
> Unixes do this pretty much the same way. And yes, adding extra data
> structures for location of related ptes is possible, but it does make
> the entire memory-mapping mechanism a bit more heavyweight. It's
> something on our list to look at in 2.3, but for other reasons: it is
> not clear whether it would make any difference to raw swapping
> performance.
>
> >> Linux VM, but we don't need that functionality: in Linux we have always
> >> swapped out ptes on a per-page-table basis, not on a per-physical-page
> >> basis. We *DO* swap processes out independently.
>
> > I dont think I understand. The whole idea of doing it globally is to do
> > some kind of LRU on a global scale and swap out pages based on that.
> > Others unixes also swap processes out independently, but only in extreme
> > cases and even then if we can figure out all pte's refering to page, its
> > quite okay to invalidate them all and swap the page out.
>
> Linux does achieve global LRU; it just achieves it somewhat
> differently. The normal clock algorithm walking over memory works
> just as you'd see in a normal Unix, but it only releases page cache ,
> swap cache and buffer cache pages. It does not unmap pages from VM
> page tables. We have a separate clock algorithm which walks page
> tables and ages the pages there, unmapping pages which have not been
> accessed recently. Only once the page is completely unmapped by any
> process can the page be truly recycled.
>
> This mechanism turns out to be more efficient in its use of memory
> cache, as we are not following chains of pte indirection for each page
> we traverse, but obviously for shared pages or sparse page tables we
> may do a bit more work than we need to.
>
> However, there is a second advantage of this scheme which is
> overwhelming. It clearly distinguishes between paging activity and
> filesystem activity (even though the same page cache is used for both,
> and even though from most other points of view our VM is truly
> "unified"). I've done a very great deal of benchmarking on this topic
> over the past 3 or 4 years on Linux, and as far as memory reclamation
> is concerned, I am now absolutely convinced that a unified page
> stealer is quite simply much less efficient than the asymmetric scheme
> used by Linux. In every case, any attempt to retune the VM so that we
> recycle unmapped pages from the page cache (ie. filesystem cache
> pages) via exactly the same rules as mapped pages (ie. VM pages)
> results in poorer performance.
>
> Currently the Linux VM is more aggressive at reaping pure cache pages
> than VM pages. Quite simply, all the evidence is that this goes much
> faster.
>
> A final advantage of doing pageout from the page table point of view
> is that it makes managing a process's RSS very very easy. We can
> easily do LRU pageout of a process's ptes if it exceeds its RSS,
> leaving the pages in cache to be minor-faulted back in later if the
> memory isn't required by somebody else. We do not currently make much
> use of this facility, but it has been an important motivation in
> preserving the current scheme.
>
> --Stephen
>

_____________________________________________________________________________
"If seven maids with seven mops, Swept it for half a
Prasun Kapoor year, do you suppose" the Walrus said, "That they
Wipro Infotech Ltd. could get it clear"."I doubt it" said the Carpenter,
Ph. 5530084/34/36/53 and shed a bitter tear
Extn. 1052
"Through the looking glass and what alice found
there" - Alice in Wonderland -

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/