Re: [PATCH 2/3] vmscan: make mapped executable pages the firstclass citizen
From: Wu Fengguang
Date: Tue May 19 2009 - 01:11:28 EST
On Tue, May 19, 2009 at 12:41:38PM +0800, KOSAKI Motohiro wrote:
> Hi
>
> Thanks for great works.
>
>
> > SUMMARY
> > =======
> > The patch decreases the number of major faults from 50 to 3 during 10% cache hot reads.
> >
> >
> > SCENARIO
> > ========
> > The test scenario is to do 100000 pread(size=110 pages, offset=(i*100) pages),
> > where 10% of the pages will be activated:
> >
> > for i in `seq 0 100 10000000`; do echo $i 110; done > pattern-hot-10
> > iotrace.rb --load pattern-hot-10 --play /b/sparse
>
>
> Which can I download iotrace.rb?
In the attachment. It relies on some ruby libraries.
> > and monitor /proc/vmstat during the time. The test box has 2G memory.
> >
> >
> > ANALYZES
> > ========
> >
> > I carried out two runs on fresh booted console mode 2.6.29 with the VM_EXEC
> > patch, and fetched the vmstat numbers on
> >
> > (1) begin: shortly after the big read IO starts;
> > (2) end: just before the big read IO stops;
> > (3) restore: the big read IO stops and the zsh working set restored
> >
> > nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree
> > begin: 2481 2237 8694 630 0 574299
> > end: 275 231976 233914 633 776271 20933042
> > restore: 370 232154 234524 691 777183 20958453
> >
> > begin: 2434 2237 8493 629 0 574195
> > end: 284 231970 233536 632 771918 20896129
> > restore: 399 232218 234789 690 774526 20957909
> >
> > and another run on 2.6.30-rc4-mm with the VM_EXEC logic disabled:
>
> I don't think it is proper comparision.
> you need either following comparision. otherwise we insert many guess into the analysis.
>
> - 2.6.29 with and without VM_EXEC patch
> - 2.6.30-rc4-mm with and without VM_EXEC patch
I think it doesn't matter that much when it comes to "relative" numbers.
But anyway I guess you want to try a more typical desktop ;)
Unfortunately currently the Xorg is broken in my test box..
> >
> > begin: 2479 2344 9659 210 0 579643
> > end: 284 232010 234142 260 772776 20917184
> > restore: 379 232159 234371 301 774888 20967849
> >
> > The numbers show that
> >
> > - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29.
> > I'd attribute that improvement to the mmap readahead improvements :-)
> >
> > - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50.
> > That's a huge improvement - which means with the VM_EXEC protection logic,
> > active mmap pages is pretty safe even under partially cache hot streaming IO.
> >
> > - when active:inactive file lru size reaches 1:1, their scan rates is 1:20.8
> > under 10% cache hot IO. (computed with formula Dpgdeactivate:Dpgfree)
> > That roughly means the active mmap pages get 20.8 more chances to get
> > re-referenced to stay in memory.
> >
> > - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the
> > dropped pages are mostly inactive ones. The patch has almost no impact in
> > this aspect, that means it won't unnecessarily increase memory pressure.
> > (In contrast, your 20% mmap protection ratio will keep them all, and
> > therefore eliminate the extra 41 major faults to restore working set
> > of zsh etc.)
>
> I'm surprised this.
> Why your patch don't protect mapped page from streaming io?
It is only protecting the *active* mapped pages, as expected.
But yes, the active percent is much lower than expected :-)
> I strongly hope reproduce myself, please teach me reproduce way.
OK.
Firstly:
for i in `seq 0 100 10000000`; do echo $i 110; done > pattern-hot-10
dd if=/dev/zero of=/tmp/sparse bs=1M count=1 seek=1024000
Then boot into desktop and run concurrently:
iotrace.rb --load pattern-hot-10 --play /tmp/sparse
vmmon nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree
Note that I was creating the sparse file in btrfs, which happens to be
very slow in sparse file reading:
151.194384MB/s 284.198252s 100001x 450560b --load pattern-hot-10 --play /b/sparse
In that case, the inactive list is rotated at the speed of 250MB/s,
so a full scan of which takes about 3.5 seconds, while a full scan
of active file list takes about 77 seconds.
Attached source code for both of the above tools.
Thanks,
Fengguang
Attachment:
iotrace.rb
Description: application/ruby
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/time.h>
static int raw = 1;
static int delay = 1;
static int nr_fields;
static char **fields;
static FILE *f;
static void acquire(long *values)
{
char buf[1024];
rewind(f);
memset(values, 0, nr_fields * sizeof(*values));
while (fgets(buf, sizeof(buf), f)) {
int i;
for (i = 0; i < nr_fields; i++) {
char *p;
if (strncmp(buf, fields[i], strlen(fields[i])))
continue;
p = strchr(buf, ' ');
if (p == NULL) {
fprintf(stderr, "vmmon: error parsing /proc\n");
exit(1);
}
values[i] += strtoul(p, NULL, 10);
break;
}
}
}
static void display(long *new_values, long *prev_values,
unsigned long long usecs)
{
int i;
for (i = 0; i < nr_fields; i++) {
if (raw)
printf(" %16ld", new_values[i]);
else {
long long diff;
double ddiff;
ddiff = new_values[i] - prev_values[i];
ddiff *= 1000000;
ddiff /= usecs;
diff = ddiff;
printf(" %16lld", diff);
}
}
printf("\n");
}
static void do1(long *prev_values)
{
struct timeval start;
struct timeval end;
long long usecs;
long new_values[nr_fields];
gettimeofday(&start, NULL);
sleep(delay);
gettimeofday(&end, NULL);
acquire(new_values);
usecs = end.tv_sec - start.tv_sec;
usecs *= 1000000;
usecs += end.tv_usec - start.tv_usec;
display(new_values, prev_values, usecs);
memcpy(prev_values, new_values, nr_fields * sizeof(*prev_values));
}
static void heading(void)
{
int i;
printf("\n");
for (i = 0; i < nr_fields; i++)
printf(" %16s", fields[i]);
printf("\n");
}
static void doit(void)
{
int line = 0;
long prev_values[nr_fields];
acquire(prev_values);
for ( ; ; ) {
if (line == 0)
heading();
do1(prev_values);
line++;
if (line == 24)
line = 0;
}
}
static void usage(void)
{
fprintf(stderr, "usage: vmmon [-r] [-d N] field [field ...]\n");
fprintf(stderr, " -d N : delay N seconds\n");
fprintf(stderr, " -r : show raw numbers instead of diff\n");
exit(1);
}
int main(int argc, char *argv[])
{
int c;
while ((c = getopt(argc, argv, "rd:")) != -1) {
switch (c) {
case 'r':
raw = 1;
case 'd':
delay = strtol(optarg, NULL, 10);
break;
default:
usage();
}
}
if (optind == argc)
usage();
nr_fields = argc - optind;
fields = argv + optind;
f = fopen("/proc/vmstat", "r");
doit();
exit(0);
}