Re: Using the zero mapped page

William Burrow (aa126@fan.nb.ca)
Wed, 8 Jan 1997 13:57:48 -0400 (AST)


[There is a lengthy quoted section consisting of prior private messages.
The new stuff starts after the marker ***.]

On Wed, 8 Jan 1997 bofh@snoopy.virtual.net.au wrote:

> >> AFAIK the kernel can determine the value, but it would have to do some
> >> limited instruction decode to do this. Such features aren't uncommon in OSs,
> >> but I'm not sure of whether Linux has them or not. If you have to implement
> >> code for 386 instruction interpretation just for this then this alone makes it
> >> a less desirable feature. When a page fault occurs the CPU aborts the
> >> instruction (at a great performance cost) and raises an exception which
> >> provides enough information to know where the code was and all that's needed to
> >> re-start it. From my theoretical knowledge of CPU design I guess that this
>
> >I was suspecting this might be the case, I just do not have easy access to
> >the kernel sources (Slackware does not put them on their live CDROM
> >filesystem anymore!).
>
> >> would increase the time for the no-op write by 2 or 3 orders of magnitude. So
> >> if you are writing to real memory instead of just writing to the zero page with
> >> the writes being swallowed by the kernel then it will be at least 100 times
> >> faster if not 1000 times faster or more. I believe that the performance loss
> >> in this case will be so great that it's not worth doing.
>
> >I am wondering if even this is acceptable. I had been thinking of another
> >idea so I already had popped open my assembler reference to see what kind of
> >expense of the following implementation has:
>
> >> However an alternative solution is available. Keep in mind that the use of
> >> a page of memory only really costs us when we page it out and then back in
> >> again. What if the kernel looked at a page when it was about to page it out
> >> and then just discarded it if it was zero filled? That way a page which had
>
> >If I read the manual right, it will take 1029 cycles to verify a page is all
> >zeroes with REPNZ SCAS (32 bit) on a 486, 8201 cycles on a 386 (somewhat over
> >half a millisecond on a 386-20, I would guess; perhaps in the order of tens of
> >microseconds on a Pentium given the page is cached). This would probably be
> >much cheaper than a disk write and read, but can still be expensive (see
> >below). Pages with non-zero contents probably won't take long to find out,
> >less than 32 iterations most likely (<261 cycles or <~26 usecs on that
> >386-20), which is nothing in comparison to the cost of a swap to disk (even
> >one iteration is likely to catch a lot of pages as non-zero).

[***]

[Note: I was thinking of something totally unrelated when I said 32
iterations: it is likely that most non-zero pages will have non-zero data
in the first few bytes and for some pathological cases to average 512
iterations (eg. pages from a randomly sparse matrix). ]

> >The guy I was talking with mentioned that he had run tests on a slightly
> >loaded system (X, a couple of logins, httpd) and found that 6-10% of page
> >swaps were of zero filled pages. He claimed that he tried to implement
> >something, but that the kernel changed swap mechanisms before he even
> >completed a patch.

[The paragraph below has an interesting point on the effect on the
average time to swap of this algorithm:]

> OK. If 5% of all swap requests are for zero pages and if we were forced to
> do things sequentially (IE running a single memory intensive process and not
> being able to do read-ahead and write-behind on paging requests - I believe
> that feature is about to be put in the kernel but hasn't been done yet) then:
> Page operations take ~6000us to access the disk.
> Page operations take ~50us to test for zero page and ~20us average to test when
> non-zero page. The average overhead of testing a page is .95 * 20 + .05 * 50 =
> 21.5us. The time taken to do the swap out operation on a page is 21.5 + .95 *
> 6000 = 5721.5, a significant improvement.
>
> If the system is paging continuously then a 5% reduction in the amount of
> paging will reduce the amount of time required to perform an operation by at
> least 5%, and probably much more. If a page is discarded instead of paged out
> because it's all zeros then a new page won't have to be allocated when it's
> read, the saving of the allocation saves swapping a page that contains non-zero
> data.
>
> >> not been used apart from writing zeros won't be paged out (and will never need
> >> to be paged back in), AND if you have been using a page and then write zeros
> >> all over it then it can go back to being the zero page! So if a process zeros
> >> all it's memory then it's amount of committed memory will decrease!
>
> >Sounds great, but the kernel might spend quite some time prepping pages to
> >receive zeroes again. Though significantly less time than reading from swap,
> >it is not a near 100% gain (and without a working prototype, it is hard to
> >guess what gains might be made). I have some harebrained concepts that might
> >be tried, but don't even want to mention them.
>
> What do you mean "prepping pages to receive zeros"? Do you mean zero filling
> pages when a process writes to them? What we could do is maintain a list of
> zero filled pages, so when a zero page would be paged out we instead just add
> it to the zero page list and set the process to have the zero page in place of
> that page. Then when a write to the zero page occurs we have a ready supply of
> zeroed pages.

The kernel must zero a page for security reasons whenever a new page is
to be allocated a process. It does make sense for the kernel to prep
zero filled pages for later use during idle time. Someone posted an ECCd
patch. I suspect a number of items such as scouring for zero filled pages
assigned to processes to recover them for use by the kernel swap routines
and zeroing unused pages for future use could be incorporated into some
kind of daemon that wonders through memory, somewhat as the ECCd daemon does.

> >> The above is just based on a few minutes thought, and I believe that it
> >> could do with more research and consideration before being put to the
> >> linux-kernel list (which is why I just sent it to you). I don't have
>
> >Mind if I repost this message to the linux-dev list?
>
> Sure. BTW Where and what is linux-dev and how does it differ from
> linux-kernel on vger?

I meant linux-kernel, but my brain had linux-dev....

--
William Burrow  --  Fredericton Area Network, New Brunswick, Canada
Copyright 1997 William Burrow  
This line left intentionally blank.
And the one below it.